Open davidpitl opened 6 years ago
it looks like your input data has a different naming schema to what I'm familiar with...
example of zip filenames in my data directory:
2017-10-30_ANICITEDBY_00-1.zip 2017-10-30_ANICITEDBY_00-2.zip 2017-10-30_ANICITEDBY_00-3.zip 2017-10-30_ANICITEDBY_00-4.zip 2017-10-30_ANICITEDBY_00-5.zip 2017-10-30_ANICITEDBY_00-6.zip ...
example of XML filenames contained on zip files:
2-s2.0-85031432039-citedby.xml 2-s2.0-85031432183-citedby.xml 2-s2.0-85031432677-citedby.xml 2-s2.0-85031432760-citedby.xml ...
El 09/11/2017 a las 8:44, Joel Nothman escribió:
it looks like your input data has a different naming schema to what I'm familiar with...
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343072695, or mute the thread https://github.com/notifications/unsubscribe-auth/AdLR2GHM3X1ozzITsNC_dwgABMqLSK8wks5s0q1mgaJpZM4QXh-q.
Well, the script is currently set up to deal with data that includes both citedby and the abstract XML, and probably won't work without that.
We have a structure like 2-s2.0-85031432760/citedby.xml. I've committed something that might help a little...
On 9 November 2017 at 20:31, davidpitl notifications@github.com wrote:
example of zip filenames in my data directory:
2017-10-30_ANICITEDBY_00-1.zip 2017-10-30_ANICITEDBY_00-2.zip 2017-10-30_ANICITEDBY_00-3.zip 2017-10-30_ANICITEDBY_00-4.zip 2017-10-30_ANICITEDBY_00-5.zip 2017-10-30_ANICITEDBY_00-6.zip ...
example of XML filenames contained on zip files:
2-s2.0-85031432039-citedby.xml 2-s2.0-85031432183-citedby.xml 2-s2.0-85031432677-citedby.xml 2-s2.0-85031432760-citedby.xml ...
El 09/11/2017 a las 8:44, Joel Nothman escribió:
it looks like your input data has a different naming schema to what I'm familiar with...
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343072695, or mute the thread https://github.com/notifications/unsubscribe-auth/AdLR2GHM3X1ozzITsNC_ dwgABMqLSK8wks5s0q1mgaJpZM4QXh-q.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343097895, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6ydegSPVqxFM7PJBhKxFT4v3hHb6ks5s0sZ_gaJpZM4QXh-q .
Are we talking about same Scopus Custom Data xsd? Attached xsd manual.
David
El 9 de noviembre de 2017 10:59:30 CET, Joel Nothman notifications@github.com escribió:
Well, the script is currently set up to deal with data that includes both citedby and the abstract XML, and probably won't work without that.
We have a structure like 2-s2.0-85031432760/citedby.xml. I've committed something that might help a little...
On 9 November 2017 at 20:31, davidpitl notifications@github.com wrote:
example of zip filenames in my data directory:
2017-10-30_ANICITEDBY_00-1.zip 2017-10-30_ANICITEDBY_00-2.zip 2017-10-30_ANICITEDBY_00-3.zip 2017-10-30_ANICITEDBY_00-4.zip 2017-10-30_ANICITEDBY_00-5.zip 2017-10-30_ANICITEDBY_00-6.zip ...
example of XML filenames contained on zip files:
2-s2.0-85031432039-citedby.xml 2-s2.0-85031432183-citedby.xml 2-s2.0-85031432677-citedby.xml 2-s2.0-85031432760-citedby.xml ...
El 09/11/2017 a las 8:44, Joel Nothman escribió:
it looks like your input data has a different naming schema to what I'm familiar with...
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub
https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343072695, or mute the thread
— You are receiving this because you commented. Reply to this email directly, view it on GitHub
https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343097895, or mute the thread
-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343104995
-- Enviado desde mi teléfono con K-9 Mail.
Very possibly not. We get the cited_by files but more too
I agree with you. Now I've files of type: 2017-10-30_ANI_04-xml-5.zip ... and this other type: 2017-10-30_ANICITEDBY_00-1.zip
but still I get a new error:
Another question ... How to include new schema attributes, like
affiliation ...?
Thanks in advance, David
Joel Nothman notifications@github.com escribió:
Very possibly not. We get the cited_by files but more too
-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343285253
Affiliation is already in the schema. I've fixed that bug, sorry.
Now I get attached error logs. My Scopus custom data has two types of files: 2017-10-30_ANI_00-xml-1.zip and 2017-10-30_ANICITEDBY_011-1.zip each log file corresponds to each type of file execution.
I can send you example XML contained on it.
My direct email is: david.perez@inv.uam.es
David
Joel Nothman notifications@github.com escribió:
Affiliation is already in the schema. I've fixed that bug, sorry.
-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/ctds-usyd/scopus/issues/25#issuecomment-343660417
++ pwd
++ pwd
Just an error but I dont know where to send it. Directory /file_150944456025_2017-10-30_ANI-CITEDBY/ contains small zip files (ex. 2017-10-30_ANICITEDBY_011-1.zip).
./extract_to_db.sh //file_1509358456025_2017-10-30_ANI-CITEDBY/
Traceback (most recent call last):
File "Scopus/db_loader.py", line 424, in
main()
File "Scopus/db_loader.py", line 416, in main
extract_and_load_docs(args.paths, pool=pool)
File "Scopus/db_loader.py", line 361, in extract_and_load_docs
for counter, doc_record in enumerate(imap(_process_one, xml_pairs)):
File "Scopus/db_loader.py", line 266, in generate_xml_pairs
if eid_filter is not None and eid_filter(int(os.path.dirname(path).rsplit('-')[-1])):
ValueError: invalid literal for int() with base 10: ''