Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

The zip file did not contain metadata/jsoncatalog.txt #50

Closed tpmccallum closed 9 years ago

tpmccallum commented 9 years ago

Hi, Getting the error "The zip file did not contain metadata/jsoncatalog.txt" when creating a bookworm at < http://bookworm.culturomics.org/create.php >. My zip file is in Dropbox here < https://www.dropbox.com/s/cohvbk3kdr4xcjl/out11feb.zip?dl=0 >.

As you can see there is a jsoncatalog.txt file in there but perhaps it does not like something about it. My thoughts are perhaps it wants absolute path to text file instead of placeholder, it wants each json entry on a new line (documentation says there should be no return line characters etc. Not sure how to fix.

Your assistance is greatly apprecieted.

Tim

tpmccallum commented 9 years ago

I have also tried substituting the babynames jsoncatalog.txt file in my zip with no luck - same error that the zip file does not contain metadata/jsoncatalog.txt

bmschmidt commented 9 years ago

If the culturomics site is working for uploads--which I'm not sure it is, and will check for you--I suspect that the problem lies where you suggest here:

it wants each json entry on a new line (documentation says there should be no return line characters etc.

Each json entry actually should be on a separate line, but a json object should not contain any newlines. The documentation should be changed to make this clear.

If that's the case, it's probably being compounded by a second error where json catalogs that don't parse are being referenced as not existing.

tpmccallum commented 9 years ago

Hi, Thanks for your reply, did you want me to contribute to the documentation as I go ahead creating my own bookworm? Also I am writing a Python program to fetch PDF files from the web for use in Bookworm, the preliminary code is at < https://github.com/tpmccallum/bookworm-pdf-harvester >.

bmschmidt commented 9 years ago

Don't worry about fixing the documentation, we can do it. But if you feel the urge to file any more reports, please do act on it and let us know where the code or docs could be improved.

On Wednesday, February 11, 2015, tpmccallum notifications@github.com wrote:

Hi, Thanks for your reply, did you want me to contribute to the documentation as I go ahead creating my own bookworm? Also I am writing a Python program to fetch PDF files from the web for use in Bookworm, the preliminary code is at < https://github.com/tpmccallum/bookworm-pdf-harvester >.

— Reply to this email directly or view it on GitHub https://github.com/Bookworm-project/BookwormDB/issues/50#issuecomment-73985103 .

tpmccallum commented 9 years ago

Hi, Just a quick reference for your documentation below, thanks for your speedy reply. < https://github.com/Bookworm-project/BookwormDB > "There should be no new line or tab characters in this file." "There should be no new line or tab characters in the JSON object."

tpmccallum commented 9 years ago

Hi, I found the issue with the "The zip file did not contain metadata/jsoncatalog.txt".

Turns out that using a plain URL works < asdf.com/asdf/asdf.zip > and the issue was because I was using a Dropbox URL like < https://www.dropbox.com/s/bhn5pc52hiik9x1/BookwormDB.zip?dl=0 >. I am guessing that the query string part fails to be parsed by Bookworm.

Thanks again Tim