dalejn / cleanBib

Probabilistically assign gender and race proportions of first/last authors pairs in bibliography entries
MIT License
149 stars 31 forks source link

Error creating cleanedBib.csv #40

Closed buddhikabellana closed 2 years ago

buddhikabellana commented 2 years ago

Hi! I've used this tool for a preprint a few months back, but I tried running it again today and ran into some issues. It seems like when I run the first code block, it doesn't produce a cleanedBib.csv output, and it runs into errors in the next two code blocks.

I've attached the bib file here too, in case there's something up with that (though, saved as a .txt since it won't let me upload a .bib).

Here's the error I get after I run the first code block:

/srv/conda/envs/notebook/lib/python3.7/site-packages/sos_notebook/kernel.py:1334: DeprecationWarning: Kernel._parent_header is deprecated in ipykernel 6. Use .get_parent()
  msg['msg_id'] = self._parent_header['header']['msg_id']
Using TensorFlow backend.

No optional .tex file found.

Here's the output for second code block:

---------------------------------------------------------------------------
TokenRequired                             Traceback (most recent call last)
/tmp/ipykernel_168/4235134905.py in <module>
    179     LA = []
    180     parser = bibtex.Parser()
--> 181     bib_data = parser.parse_file(ID[0])
    182     counter = 1
    183     nameCount = 0

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/__init__.py in parse_file(self, filename, file_suffix)
     52         with open_file(filename, encoding=self.encoding) as f:
     53             try:
---> 54                 self.parse_stream(f)
     55             except UnicodeDecodeError as e:
     56                 raise PybtexError(six.text_type(e), filename=self.filename)

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_stream(self, stream)
    408     def parse_stream(self, stream):
    409         text = stream.read()
--> 410         return self.parse_string(text)

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_string(self, text)
    395             macros=self.macros,
    396         )
--> 397         for entry in entry_iterator:
    398             entry_type = entry[0]
    399             entry_type_lower = entry_type.lower()

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_bibliography(self)
    193                 yield tuple(self.parse_command())
    194             except PybtexSyntaxError as error:
--> 195                 self.handle_error(error)
    196             except SkipEntry:
    197                 pass

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in handle_error(self, error)
    381     def handle_error(self, error):
    382         from pybtex.errors import report_error
--> 383         report_error(error)
    384 
    385     def parse_string(self, text):

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/errors.py in report_error(exception)
     76 
     77     if strict:
---> 78         raise exception
     79     else:
     80         print_error(exception, 'WARNING: ')

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_bibliography(self)
    191             self.command_start = self.pos - 1
    192             try:
--> 193                 yield tuple(self.parse_command())
    194             except PybtexSyntaxError as error:
    195                 self.handle_error(error)

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_command(self)
    224             self.required([body_end])
    225         except PybtexSyntaxError as error:
--> 226             self.handle_error(error)
    227         return make_result()
    228 

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in handle_error(self, error)
    381     def handle_error(self, error):
    382         from pybtex.errors import report_error
--> 383         report_error(error)
    384 
    385     def parse_string(self, text):

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/errors.py in report_error(exception)
     76 
     77     if strict:
---> 78         raise exception
     79     else:
     80         print_error(exception, 'WARNING: ')

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_command(self)
    221             make_result = lambda: (command, (self.current_entry_key, self.current_fields))
    222         try:
--> 223             parse_body(body_end)
    224             self.required([body_end])
    225         except PybtexSyntaxError as error:

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_entry_body(self, body_end)
    240             key_pattern = self.KEY_PAREN if body_end == self.RPAREN else self.KEY_BRACE
    241             self.current_entry_key = self.required([key_pattern]).value
--> 242         self.parse_entry_fields()
    243         if not self.want_current_entry():
    244             raise SkipEntry

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_entry_fields(self)
    248             self.current_field_name = None
    249             self.current_value = []
--> 250             self.parse_field()
    251             if self.current_field_name and self.current_value:
    252                 self.current_fields.append((self.current_field_name, self.current_value))

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/database/input/bibtex.py in parse_field(self)
    260             return
    261         self.current_field_name = name.value
--> 262         self.required([self.EQUALS])
    263         self.parse_value()
    264 

/srv/conda/envs/notebook/lib/python3.7/site-packages/pybtex/scanner.py in required(self, patterns, description, allow_eof)
    118             if not description:
    119                 description = ' or '.join(pattern.description for pattern in patterns)
--> 120             raise TokenRequired(description, self)
    121         else:
    122             return token

TokenRequired: syntax error in line 205: '=' expected

And here's the output for third code block:

Warning message in file(file, "rt"):
“cannot open file '/home/jovyan/cleanedBib.csv': No such file or directory”

Error in file(file, "rt"): cannot open the connection
Traceback:

1. read.csv("/home/jovyan/cleanedBib.csv", stringsAsFactors = F)
2. read.table(file = file, header = header, sep = sep, quote = quote, 
 .     dec = dec, fill = fill, comment.char = comment.char, ...)
3. file(file, "rt")

Any thoughts would be appreciated! Thanks for putting this tool together!

all_refs_NoCDS copy.txt

dalejn commented 2 years ago

Hi, thanks for using the tool and for reaching out! It looks like the error is pointing out a parsing issue with the .bib file at line 205 for the entry:

@article{Ortiz De Gortari2015Game, journal={Computers in Human Behavior}, doi={10.1016/j.chb.2015.04.060}, issn=07475632, number={PA}, note={publisher: Elsevier Ltd}, title={Game Transfer Phenomena and its associated factors: An exploratory empirical online survey study}, volume=51, author={Ortiz De Gortari, Angelica B. and Griffiths, Mark D.}, pages={195--202}, date=2015, year=2015, }

The parser doesn't know how to handle spaces in the tag, so if we replace that entry's tag Ortiz De Gortari2015Game with OrtizDeGortari2015Game, it'll fix it.

buddhikabellana commented 2 years ago

Ah -- had a couple of those. Thanks for pointing that out!