Closed tucotuco closed 10 years ago
Here's output for one small resource:
https://www.dropbox.com/s/tmsna3h50ckbn5a/msbobs_mamm.txt
That looks ok to me. The only issue is that the last field is empty, followed by a line break. That's not strictly an issue, since the dwca-indexer
should be splitting on line breaks, and then tabs.
I see this at the end of a record:
\tICZN\t\t\t\t\t\t\t\t\t\t\t\t\t\tsex: female\t\n
p.s. that's prior to splitting on linebreaks.
Gulo correct. Solution implemented in dwc-indexer be replacing non-printing characters with space, then trimming.
The symptoms appeared in VN portal download (as broken line following namepublishedinyear) until pull request https://github.com/VertNet/webapp/pull/410. They still appear in record details, where namepublishedinyear shows on the list of fields in the all terms tab, but without an apparent value. The actual value is '\n'.
Suspect the problem might be harvest-fields processing in https://github.com/VertNet/gulo/blob/develop/src/clj/gulo/fields.clj#L64.