Closed digitalimagep closed 4 years ago
@digitalimagep : Can you send me your code to reproduce this? Thanks.
I also encounter this issue.
import arxivscraper
scraper = arxivscraper.Scraper(category='physics:cond-mat', date_from='2017-05-27',date_until='2017-06-07')
output = scraper.scrape()
output:
http://export.arxiv.org/oai2?verb=ListRecords&from=2017-05-27&until=2017-06-07&metadataPrefix=arXiv&set=physics:cond-mat
fetching up to 1000 records...
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-47bd483a35f6> in <module>
----> 1 output = scraper.scrape()
~/anaconda/envs/word2vec/lib/python3.6/site-packages/arxivscraper/arxivscraper.py in scrape(self)
168 for record in records:
169 meta = record.find(OAI + 'metadata').find(ARXIV + 'arXiv')
--> 170 record = Record(meta).output()
171 if self.append_all:
172 ds.append(record)
~/anaconda/envs/word2vec/lib/python3.6/site-packages/arxivscraper/arxivscraper.py in __init__(self, xml_record)
42 self.updated = self._get_text(ARXIV, 'updated')
43 self.doi = self._get_text(ARXIV, 'doi')
---> 44 self.authors = self._get_authors()
45 self.affiliation = self._get_affiliation()
46
~/anaconda/envs/word2vec/lib/python3.6/site-packages/arxivscraper/arxivscraper.py in _get_authors(self)
55 authors_xml = self.xml.findall(ARXIV + 'authors/' + ARXIV + 'author')
56 last_names = [author.find(ARXIV + 'keyname').text.lower() for author in authors_xml]
---> 57 first_names = [author.find(ARXIV + 'forenames').text.lower() for author in authors_xml]
58 full_names = [a+' '+b for a,b in zip(first_names, last_names)]
59 return full_names
~/anaconda/envs/word2vec/lib/python3.6/site-packages/arxivscraper/arxivscraper.py in <listcomp>(.0)
55 authors_xml = self.xml.findall(ARXIV + 'authors/' + ARXIV + 'author')
56 last_names = [author.find(ARXIV + 'keyname').text.lower() for author in authors_xml]
---> 57 first_names = [author.find(ARXIV + 'forenames').text.lower() for author in authors_xml]
58 full_names = [a+' '+b for a,b in zip(first_names, last_names)]
59 return full_names
AttributeError: 'NoneType' object has no attribute 'text'
I also got the same error:
AttributeError: 'NoneType' object has no attribute 'text'
Any solution for this?
Well, I managed to bypass the error by implementing this temporary fix:
def _get_authors(self):
authors_xml = self.xml.findall(ARXIV + 'authors/' + ARXIV + 'author')
last_names, first_names = list(), list()
for author in authors_xml:
try:
last_names.append(author.find(ARXIV + 'keyname').text.lower())
except AttributeError:
last_names.append("")
except Exception as e:
raise e
try:
first_names.append(author.find(ARXIV + 'forenames').text.lower())
except AttributeError:
first_names.append("")
except Exception as e:
raise e
In fact, it seems that at some point, we stumble upon some author without first name. Thus, I just take into account this case, and append an empty string. However, one should take into account the specific case when the author has no forename, thus aa more formal code should be given, but for research purposes, it seems to be a good temporary fix.
In fact, I encountered this as well. Possibly you only have to change the sentence in the line 57
try:
first_names = [author.find(ARXIV + 'forenames').text.lower() for author in authors_xml]
except:
first_names = []
or maybe just put an if-sentence inside, i.e. change the original from
first_names = [author.find(ARXIV + 'forenames').text.lower() for author in authors_xml]
to
first_names = [author.find(ARXIV + 'forenames').text.lower() for author in authors_xml if author.find(ARXIV + 'forenames') is not None]
Won't: Solution 1: remove all first names in case of an error in a unique author? Solution 2: create first_names and last_names list of different shapes?
But I do think creating an if-else condition is the way to go.
+1
I slightly changed the suggestion by @treemantan and fixed the case of empty name in my local installed version.
Here's the code, I've used:
first_names = [author.find(ARXIV + 'forenames').text.lower() if author.find(ARXIV + 'forenames') is not None else 'n/a' for author in authors_xml ]
I prefer to have an 'n/a' as string and replace it later if I need.
PR #9 should have resolved this. Closing the issue.
site-packages/arxivscraper/arxivscraper.py", line 57, in
first_names = [author.find(ARXIV + 'forenames').text.lower() for author in authors_xml]
AttributeError: 'NoneType' object has no attribute 'text'