MarcelWaldvogel commented 2 years ago

Maybe generate a text/vcard representation out of the structure

ztiromoritz commented 2 years ago

Hey @MarcelWaldvogel nice library. I had to parse vcard data in a JS/TS Frontend and it works very well.

How would the interface of such a generator look like? Something like this? toTextVCard(parsedVCard:ParsedVCards):string

How would you handle nags in the parsed card. For example, would you keep an "Unescaped comma in value" and keep it unescaped in the result?

MarcelWaldvogel commented 2 years ago

The function I started writing has the following prototype:

export function stringifyVCard(vcard: VCard4): string;

I planned to ignore the nags, but you certainly have a point.

Generally, I found the escaping rules in the RFC to have some ambiguity (see e..g. the TYPE comment in the README and there is way too much old code and data around (see also the PHOTO comma escaping errata).

So, for the single value case, I was planning to not escape commas, as old parsers would not like it escaped and even new ones probably would accept the unescaped comma, for compatibility.

I never thought of using the nags to (somewhat) faithfully reproduce the original context. Right now, I would say they are too fine-granular (single property or even a single field); if some compatibility options should be made available, I would prefer them to be more general (e.g., "do not escape any single values"), as I think the goal should be to create "good" output (e.g., maximize standards compliance and compatibility, which might not be the same thing every time), and not to faithfully replicate bad input. But as I do not have a concrete use case for the generation, I am absolutely open to comments.

So, the prototype might end up to be more like:

export function stringifyVCard(vcard: VCard4, encodingOptions?: …): string;

or even

export function stringifyVCard(vcard: Omit<VCard4, "nags" | "hasErrors">, encodingOptions?: …): string;

MarcelWaldvogel commented 2 years ago

@lenalebt: Would you have a preference?

lenalebt commented 2 years ago

I'm by no means an expert on the topic :). I read a few things in the past weeks around VCard handling and how one should export VCards, but more from a backend perspective (where I am currently using https://github.com/mangstadt/ez-vcard). Maybe you already know that resource, I found it quite useful and it has a few hints about how one should or should not implement VCard exports: https://sabre.io/dav/building-a-carddav-client/#retain-full-vcards%21

They basically say "don't try to reproduce the VCard from an internal representation, most probably you will run into problems". Having this in mind: I do have some doubts about using the nags to reproduce what was there initially, as it basically is the approach they recommend not doing. Not using them probably also would be problematic though :D. But it is based on the above article, and not on personal experience so far as I was able to circumvent implementing that myself.

Full disclouse: I'm working closely with @ztiromoritz and am more of a backend person :-).

MarcelWaldvogel commented 2 years ago

I did not know about Sabre pointer; thanks, it confirms my intuition. (Also reminds me of a discussion with the DAVx⁵ guys a few years ago about CardDAV synchronization issues; I guess they would take a similar stance.)

Thanks for taking the time to respond!

ztiromoritz commented 2 years ago

I read the article @lenalebt mentioned above more like: "don't use your own usecase specific datamodel to recreate the vcard. If you have to write text/vcard then you have to focus on that, and keep all the odds and nags you found in the source, and keep it as untouched as possible"

But I understand the intention of this library here as exactly that: A vCard focused datamodel.

For me this concludes to:

Write back what you read. So even if you nag about missing escaped commas, write back the same unescaped commas. Especially on the more uncommon fields.
An open question might be: Is this library responsible for escaping und unescaping the values? I would think: no it is not. Escaping and unescaping would be outside of this library scope or a different layer above the ParsedVCard. So for example: As long as you don't add CRLF to your NOTES, or sth. other that would destroy the whole vCard structure, the parser & generator only nags but doesn't correct you.
To preserve as much information as possible, storing the order of the entries, could be helpfull to recreate the original card.

ztiromoritz commented 2 years ago

FYI How I currently use this library: I have to read the vcard and may be change some values based on user input in frontend.

READ:

I parse the whole structure and extract the information I need from the parsedVCard to present it to the user

WRITE:

I split the raw file line by line
- i.e. logical lines. So I have to unfold overlong lines. I stole your: vcf.replace(/\r?\n[ \t]/g, '');
I identify the raw lines I have to change using simpel string match and your parseLine function.
Remove the line(s) that should be changed
Add new line(s) in the position of the removed one(s)
- To produce this new lines, I have to write some utils and make some assumptions. Here a writeLine function corresponding to parseLine would be interesting.
Write back using foldline(line, 75) ( foldline and a simple lines.join('\r\n')
So the unchanged lines should appeare as untouched as possible, except:
- I don't keep the original newlines which could be non standard '\n'.
- I unfold and fold which might change the line width of the source vcard, if the source card was non standard.
And thanks in return, this discussion is very helpfull for my current use case.

MarcelWaldvogel commented 2 years ago

Ah, thanks for the description of the use case (and the hoops you have to be jumping through). So, right now, you do a "minimally invasive" approach, which cannot be easily generalized.

Goals

Let me propose a more general approach:

It should be easy to modify data
The existing structure and data should be maintained, whenever possible; especially, if nothing is modified there
Preserving "illegal" formatting is not necessary, but still, maintaining compatibility with the existing file structure should be striven for

Requirements

I envision three steps along the lines to reach this goal, which I would prefer instead of reconstructing unclear or bad formatting from the tags:

While parsing, store all lines which contain errors (and thus do not end up in the final output) in a separate array, so they can be inspected by application code (probably rare) and reproduced in output (probably desirable)
While parsing, store all lines in their original form (after unfolding) together with the property. On output, if the structured data has not been modified, output the original line
While parsing, store all values (for both parameters and properties) together with structured value. On output, if the value has not been modified, output the original value

I think 1 would be a good thing to have anyway and easy to implement.

2 would be good to have for anyone to modify the file and might be reasonable to implement.

However, I believe the cost-benefit factor of 3 to be bad (amount of code and complexity vs. actual need to preserve this level of detail; especially given the fact that any third party relying on this behaviour violates the standard and probably should fix their code. And yes, I know that this too often is outside of our control.)

What do you think?

Implementation options

For 2 (and also for 3, at a finer granularity) I see the following options:

User-driven (simple): There is a JS property, tentatively called rawLine holding the raw line (maybe only if any nags were added while parsing that line?). The library user needs to delete the property, when making any changes. This is simple and efficient, but error-prone. (A compromise might be to have a parse flag, preserveLines, which controls whether rawLine is filled. So anyone not requiring the detailed preservation would not have to care.)
Getter/setter (complicated): Any modification has to go through a setter or similar, which automatically deletes rawLine. This requires less intuitive access and moving away from the current "just a bunch of data" approach.
Automatic (more expensive): rawLine is a tuple, holding (a) the raw line and (b) a copy of the structured data. On output, the two data structures are compared; if they are equal, the raw line is written, otherwise a line freshly generated.

I like the "user-driven compromise" (if you want lines to be preserved, specify preserveLines and delete rawLine on every vCard property you modify) and the "automatic mode" best. If you think you can handle the deletion, I would go to the "user-driven compromise" (besides storing all the lines that were not parsed). What is your take?

lenalebt commented 2 years ago

I would agree with your goals.

My gut feel says "requirements option 2 feels sufficient" and you could still go for option 3 in case it turns out to be necessary.

Wrt implementation options, to me the user-driven simple solution does not have that much appeal - but I purely am looking at it from the user's perspective. Of course it would be the easiest to implement.

Would the "setter" implementation really be that complicated? To me it sounds like a natural thing to do. I'm really not deep into TypeScript land, and risking to come across as "the emperor without clothes on"...: as far as I understand https://www.typescriptlang.org/docs/handbook/2/classes.html#getters--setters for a user it would still seem like "just a bunch of data"? Of course it would be more to do wrt implementation effort.

MarcelWaldvogel commented 2 years ago

AFAIK, if you want to do anything on e.g. vCard.TEL.value = '123' (as opposed to vCard.TEL.set('value', '123')), you would have to replace each object with a Proxy to that object first.

So, during parsing, you want the real objects (otherwise, you will have to fight your own setter trying to delete the rawLine on anything you do), but before returning the object, create a Proxy for each of them, which will intercept the modifications. On every access, the Proxy's functions will be called.

I think I prefer the "copy" approach to that.

lenalebt commented 2 years ago

Understand, thanks for clarification and sorry for my ignorance maybe :).

I have a few ideas about how one could still simplify usage, but I'm not sure yet whether it would really make things simpler. It's basically structured around the "lens" idea from functional programming, basically having something that will do the modification for you. It would make it possible for you to go with the simple approach, but adding a way to have the "complex" modification made for you. I'll try it out on monday with @ztiromoritz.

MarcelWaldvogel commented 2 years ago

Optics seems to be topic I have been running into quite frequently in the past few months and I would like to get my hands dirty with them one of these days. Therefore, I would greatly appreciate any pointers in this direction.

However, I do not think anymore that the "copy" approach is that expensive: remembering the JSON.stringify() of the vCard property together with the raw line is probably OK, as it should be able to recognize all the changes that should trigger forgetting the raw line. So don't bother if you should be unhappy with the result of your lens discussion.

The only downside the JSON.stringify() approach has is with dealing with huge properties, such as PHOTOs, as this prevents the reuse of the value string.

MarcelWaldvogel commented 2 years ago

BTW: What kind of API would you prefer?

Should parseVCard() always return this annotation?
Is an additional call to makePreservable() (or similar) better?

The latter could help reduce storage and (browser) code size to those users who just want to parse the values, with no intention to write it back almost-as-is, but is less elegant. (If JSON.stringify() or optics work out, then the additional code might be so little, that a flag to parseVCard() might be the best of both worlds.)

MarcelWaldvogel commented 2 years ago

It turns out, storing a JSON.stringify() version of the object along with it is not necessary. The rawLine can just be reparsed at export time and the result checked for deep equality. For faithful recreation, the rawLine is a must anyway, so no space wasted. (One simple option for the deep equality test of course would still be to JSON.stringify() both objects.)

lenalebt commented 2 years ago

Okay, so I guess you won't need the optics version then?

I'll shortly present the optics idea anyways, although I did not have a chance yet to talk to @ztiromoritz about it, just for the sake of nerd-sniping you into it :D. I have not used the typescript libs here and seldomly code typescript (I'm more at home in Scala / Kotlin), trying to write it down - take it rather as pseudocode here. This is taken from https://github.com/gcanti/monocle-ts/blob/master/README.md where I was linked from your optics reference.

import { Lens } from 'monocle-ts'
const name = Lens.fromPath<Employee>()(['company', 'address', 'street', 'name']);
const capitalizeName = name.modify(capitalize);

capitalize(employee)

The return value of the last one would be a changed object.

Now apply a slight modification (and yep, it's mutable over here, this is not what lenses are made for, just to get the idea across):

import { Lens } from 'monocle-ts'
const companyLens = new Lens<Employee, Company>
  employee => employee.company,
  company => employee => {
    employee.company = company;
    employee.rawLine = null;
  } )
);

const companyNameLens = Lens.fromPath<Company>()(['address', 'street', 'name']);
const nameLens = companyLens.compose(companyNameLens);
nameLens.set("Foostreet");

I seldomly code TypeScript, and just have written this down inline in this editor, so sorry if it does not compile.

You could now just hide that lens API and make an own fromPath version that will simply do that first step that I'm doing manually for the companyLens. You could then use it like so:

let vcard = //....
MyLens.fromPath<VCard>()(['address', 'street', 'name']).set("Foostreet")(vcard)

This is how you'd use one of those lenses. My idea basically was to not use proxy objects everywhere, but just have a lens for the very first object level that would not only change the value, but also remove that rawLine.

MarcelWaldvogel commented 2 years ago

Thank you very much for the inspirational writeup, which helped me understand Lenses better (e.g., I had forgotten about the immutable part). Do I understand it right lenses are essentially a comfortable way to achieve copy-on-write (CoW) semantics in structured data?

So, if the code wanted to be notified about changes, we either would need to make the current data structure read-only (Object.freeze or readonly attribute to the fields), or add a field setter (which in JS/TS would require a Proxy). (I'll call the object properties "fields" here to distinguish them from vCard properties.) Otherwise, we have no way of enforcing the developer to use fromPath(). (The readonly attribute is no help, as this is checked only at compile time, not at run time; so we cannot replace the immutable object with a mutable one later, which would be easy for Java (and probably also Scala or Kotlin).)

The Lens approach in TS/JS has the (syntactical) disadvantage that you can no longer simply write vCard.ADR[0].value.postalCode, all with the help of autocompletion, but the much longer vCard.fromPath(['ADR', 0, 'value', 'postalCode']), probably even without autocompletion support. Hmm…

⌚ [Time passes]

After brooding over the CoW concept, how about the following:

The compile-time type of all vCard properties is readonly (not the top-level dictionary, not the arrays, as we do not need to know when they are modified)
Each property has a modify() method, which just does a compile-time type cast, returning the very same object, but with its field's type changed to read-write (this all happens only in the type system, no code is generated for this and no copying is done) and deleting rawLine.

So, the fields can still be accessed (read-only) as vCard.ADR[0].value.postalCode, however, assigning to them is prevented at compile time. Also, one can still do add/replace entire properties vCard.ADR[1] = {value:{postalCode: '12345', …}}, vCard.BDAY = {value: '1999-12-31'} using the current syntax. Only the modification of data of existing properties requires one to go through vCard.ADR[0].modify().value.postalCode = '98765'. This seems both much cleaner and faster than having to parse and twice JSON.stringify() every property again on output.

@ztiromoritz, @lenalebt: What do you think? (And many, many thanks for the inspiration!)

MarcelWaldvogel commented 2 years ago

Would have been too good to be true. The creation of new/replacement properties would now be like vCard.BDAY = new FaithfulProperty({value: '1999-12-31'}), as the modify() method now needs to be in every vCard property.

Options:

Make modify() a global function.
Add a vCard.set(propName, propInfo) method (and its array counterpart), which does vCard[propName] = new FaithfulProperty(propInfo)
Go with Lenses
Go with the double-parse-and-JSON-stringify approach
…?

johanneshiry commented 2 years ago

Hi everyone, first of all thanks for the library. It solves some issues I have over here quite well. However, I also have the need to convert parsed vCards back to valid vCard strings. Therefore I'm also interested in such a generator. Is there any progress on this issue expected soon? I'm afraid I'm a total noob in javascript/typescript and therefore cannot contribute much on the code side. However, in my backend I use this vCard parser library (https://github.com/mangstadt/ez-vcard) which also has a "generator" (called writer here) which may be usesful as a blueprint for the corresponding javascript implementation?

MarcelWaldvogel / vcard4-ts

Include a generator #3

Goals

Requirements

Implementation options