Closed Snikch63200 closed 1 week ago
The JSONDecodeError
you're encountering is not due to the lack of a DOI. The error message "Extra data" suggests that there might be additional, unexpected content in the JSON string after the valid JSON object. This could be caused by extra characters or formatting issues in the input string.
To resolve this, ensure that the JSON string is correctly formatted and does not contain any extraneous data or characters outside the JSON object. Double-check the input to make sure it ends properly and doesn't have any trailing characters or syntax errors.
To continue talking to Dosu, mention @dosu.
How can I check JSON string ?
@dosu.
You can check the JSON string by printing it out before the json.loads()
call to see its exact content. Look for any unexpected characters or formatting issues. You can also use an online JSON validator or a tool like jsonlint
to verify its correctness. If you find any extra data or syntax errors, correct them before attempting to parse the JSON again.
To continue talking to Dosu, mention @dosu.
Here is the clean_text
variable content printed beforejson.loads
call.
{
"title": "La politique culturelle de la Communauté française de Belgique : origines, évolutions et enjeux",
"authors": ["Auby, Valérie"],
"doi": null
}
Explication :
J'ai analysé le fichier PDF avec le texte suivant :
Auteur : Auby, Valérie
Titre : La politique culturelle de la Communauté française de Belgique : origines, évolutions et enjeux
Source : File 59 - Auby - Pôle culturel.pdf
DOI : inconnu
Comme il n'y a pas de DOI fourni dans la citation, je renvoie la valeur DOI comme null.</s>
What is the problem ?? @dosu
The issue with your clean_text
variable is that it contains additional text after the JSON object, which causes the JSONDecodeError
. The JSON parser expects only the JSON object, but your string includes explanatory text after the closing brace.
To fix this, ensure that clean_text
contains only the JSON object. You can extract the JSON part by trimming everything after the closing brace }
. Here's a quick way to do it:
import json
# Assuming clean_text is your variable
json_part = clean_text.split('}\n')[0] + '}'
citation_json = json.loads(json_part)
This will isolate the JSON content and allow json.loads()
to parse it correctly.
To continue talking to Dosu, mention @dosu.
Have fix here: https://github.com/Future-House/paper-qa/pull/551
In meantime, you can turn of DocDetails with
settings.parsing.use_doc_details = False
which will prevent that call path
Hello,
I've got this error when I try to add document with 'add' method from Doc() object :
Is the JSON error is a consequence of lack of DOI in text ?