gimmefreshdata / source-reeflifesurvey

fresh data configuration for Reef Life Survey
MIT License
0 stars 0 forks source link

meta.xml invalid xml #2

Closed jhpoelen closed 8 years ago

jhpoelen commented 8 years ago

Please see http://tools.gbif.org/dwca-validator/ and http://tools.gbif.org/dwca-reports/231-4079204705948353109.html . Same pattern was observed in the neon archive.

I'll try to add some meaningful error messages in jenkins logs.

jhammock commented 8 years ago

Hmm... well, there was another odd quotation mark (apparently I introduce them. The fixed ones were still fixed; this one was for a new column). I fixed that, but it's still failing- now upon "unpack"

http://tools.gbif.org/dwca-validator/ now says:

Archive could not be read

org.gbif.dwca.io.UnkownDelimitersException: Unable to detect field delimiter org.gbif.io.CSVReaderFactory.buildArchiveFile(CSVReaderFactory.java:129) org.gbif.io.CSVReaderFactory.build(CSVReaderFactory.java:46) org.gbif.dwca.io.ArchiveFactory.readFileHeaders(ArchiveFactory.java:396) org.gbif.dwca.io.ArchiveFactory.openArchiveDataFile(ArchiveFactory.java:301) org.gbif.dwca.io.ArchiveFactory.openArchive(ArchiveFactory.java:320) org.gbif.dwca.action.ValidateAction.validateArchive(ValidateAction.java:902)

No further reference to line 3, column 91, etc.

I thought I delimited with tabs, and my mere mortal skills cannot detect the difference between these characters and the ones in the NEON archive, which is now working.

It's a shame about the character sensitivity. Not sure who invented different kinds of tabs and quotation marks. Those of us who write for human consumption can only detect one of each in our keyboards, and they only make life difficult for writers for machine consumption...

jhpoelen commented 8 years ago

Bummer! If you attach the current meta.xml, I'll have a look and suggest a "fixed" version.

jhammock commented 8 years ago

oops, darn. That was extra clumsy of me. Fetching the file from my dropbox I had given it something odd rather than a zip file (a textclipping? Never heard of them.) Anyway, I zipped the files afresh, they are here: https://www.dropbox.com/s/1bt80109923z89l/reeflifesurvey.zip?dl=1

and they're back to the original error. I reopened the archive to inspect the meta, and the quotes do all seem to be fixed now. The changes I made from NEON (my template) are in the new Family term (field index=10) and a different made up URI in field index=0

jhpoelen commented 8 years ago

you had a malformed meta.xml . . . I fixed the malformedness it by adding fieldsEnclosedBy="" .

See attached meta.xml.txt and rename to meta.xml (github doesn't like xml). meta.xml.txt

better?

jhammock commented 8 years ago

Arg! Better but failing on dwc2parquet. Validator says success, but no metadata description found, if that matters? http://tools.gbif.org/dwca-reports/231-7851024029041127190.html

https://www.dropbox.com/s/oa7cji06c2stcrf/reeflifesurvey.zip?dl=1

jhpoelen commented 8 years ago

Our friendly validator said The data file contains 422,946 rows with 1 columns. which made me wonder . . . and found that the value of the delimiter attribute in meta.xml /t should be \t . (the backslash \ is an escape character, for forward slash is not)

xml is fun right?

jhammock commented 8 years ago

I'm at a loss. Latest fixes are in, validating, failing at unpack.

http://tools.gbif.org/dwca-reports/231-6471616325711396872.html

https://www.dropbox.com/s/faxqx2bqf1up1sg/reeflifesurvey.zip?dl=1

jhpoelen commented 8 years ago

@jhammock the error you saw was not related to the reeflifesurvey archive. The weather looks pretty good today at http://archive.effechecka.org : screen shot 2016-08-18 at 2 52 08 pm