Open themerekat opened 2 days ago
@themerekat thanks for your prompt reply and for opening/linking the issue in the BioKIC tracker. Am curious to hear how your team is going to approach this export issue that appears to prevents the flow of records beyond Symbiota for quite some collections.
After further investigation, this appears to only be the case when the double quotes are included in a JSON object (this problem is not seen in other cases of double quotes). @jhpoelen , this helps to explain why it hasn't been obvious in the past (because it's relatively rare).
@themerekat thanks for looking into this. From the perspective of a csv parser, the text in some field is just that: text. So, I am a little confused about why this effects only certain JSON snippet embedded in some csv field value.
Please note that even if (I actually have some more examples) this is relatively rare, the impact is that records are associated in the effected dataset are at risk of being unavailable through national and international data networks. With this, valuable collection records might be hidden.
As you know, I have a method (based on open source tools and open data) to detect and pinpoint these issues.
Are you planning to fix this high impact csv export issue?
@jhpoelen , my understanding is that what we use to create the JSON snippets in the database is the culprit, encoding things differently than the way the rest of the things are encoded.
@themerekat thanks for clarifying. Sounds like a bug to me. . . but hey, I am not the one fixing it ; )
@themerekat thanks for clarifying. Sounds like a bug to me. . . but hey, I am not the one fixing it ; )
That's why the issue is labeled as "bug"!
Touché!
Again, thanks for your prompt reply and looking into this issue. I realize that you probably have a lot on your plate.
This disallows data flow to GBIF, for example
See description here: https://github.com/jhpoelen/cite-the-bunnies/issues/1