UnicodeEncodeError when using bulk upload

diging / amphora

A lightweight digital repository for corpus analysis

GNU General Public License v3.0

6 stars 2 forks source link

UnicodeEncodeError when using bulk upload #6

Closed ajkulkarni closed 8 years ago

ajkulkarni commented 9 years ago

screenshot from 2015-09-16 11 08 32

ajkulkarni commented 9 years ago

Similar issue reference: https://github.com/diging/tethne/issues/85

erickpeirson commented 9 years ago

Can you add some details about how to recreate this error? See this guide for reference

ajkulkarni commented 9 years ago

Collect some bibliographic records using Zotero
Arrange them into collections based on the subject of the records
Export the collection along with the data files
Zip the data up and from the JARS Django admin interface, try adding your Zotero collections (Add Resource > Bulk)

Expected: Successful bulk import with option to save collections Actual: Throwing UnicodeEncodeError as seen in attached screenshot

Last Commit SHA: a2b28a313650807d28212545661b6e7a387d6bea

erickpeirson commented 9 years ago

Cool. So this happens when you click the "Submit" button on the bulk upload form?

ajkulkarni commented 9 years ago

Yes! The particular collection I created is always throwing this error. Should I discuss this with Nischal as he is working on a similar unicode encoding error?

erickpeirson commented 9 years ago

Yes, work together on this. Thanks!

nischalsamji commented 9 years ago

@nakapika For starters, you can look at this !!! http://www.joelonsoftware.com/articles/Unicode.html

erickpeirson commented 9 years ago

Great article!

Erick Peirson Postdoctoral Scholar ASU-SFI Center for Biosocial Complexity Arizona State University

On Sep 18, 2015, at 4:37 PM, Nischal Samji notifications@github.com wrote:

@nakapika For starters, you can look at this !!! http://www.joelonsoftware.com/articles/Unicode.html

— Reply to this email directly or view it on GitHub.

ajkulkarni commented 9 years ago

Really good article! So does this mean that we should use UTF-8 in all our projects to prevent encoding errors?

ajkulkarni commented 9 years ago

@nischalsamji and I have been working on this for a while now. The problem is that the conference paper names have an apostrophe which is not encoded in utf-8 and hence it is breaking the code. We tried encoding all the text to utf-8 but it didn't work. We will continue to work on this today to find a permanent solution.

erickpeirson commented 9 years ago

@nakapika @nischalsamji Thanks for tackling this. Don't the .rdf files have UTF-8 encoding when they are created?

nischalsamji commented 9 years ago

@erickpeirson When the file contents has a unicode character, the parser works fine. If there is a unicode character in the file name, it is throwing an error.