AgentschapPlantentuinMeise / TETTRIs-mapping-taxonomists

TETTRIs WP3, task 3.2: automatic mapping of taxonomic expertise
0 stars 2 forks source link

DtypeWarning during processing #7

Closed qgroom closed 1 week ago

qgroom commented 3 weeks ago

The following warning appeared during processing. It is not clear if it is an actual error, but should perhaps be investigated.

Number of publications for primary_location.source.id:S103214341: 14
Number of publications for primary_location.source.id:S128425624: 4672
Another 13536 articles found
Number of publications for primary_location.source.id:S4210201479: 22
Number of publications for primary_location.source.id:S4210228804: 1
Taxonomic articles filtered. Results in data/interim/filtered_articles.tsv.
Abstract inverted indices converted to texts
C:\Users\quentin\Documents\GitHub\TETTRIs-mapping-taxonomists\src\data\prep_taxonomy.py:27: DtypeWarning: Columns (6,7,8,9,10,13,16,21,22) have mixed types. Specify dtype option on import or set low_memory=False.
  backbone = pd.read_csv(path, sep="\t", on_bad_lines='skip')
Taxonomic articles parsed for taxonomic subjects. Results in data/processed/taxonomic_articles_with_subjects.tsv.
qgroom commented 3 weeks ago

This is how the script finishes...

Number of publications for primary_location.source.id:S4210201479: 22
Number of publications for primary_location.source.id:S4210228804: 1
Taxonomic articles filtered. Results in data/interim/filtered_articles.tsv.
Abstract inverted indices converted to texts
C:\Users\quentin\Documents\GitHub\TETTRIs-mapping-taxonomists\src\data\prep_taxonomy.py:27: DtypeWarning: Columns (6,7,8,9,10,13,16,21,22) have mixed types. Specify dtype option on import or set low_memory=False.
  backbone = pd.read_csv(path, sep="\t", on_bad_lines='skip')
Taxonomic articles parsed for taxonomic subjects. Results in data/processed/taxonomic_articles_with_subjects.tsv.
Authors extracted from articles. Results in data/processed/all_authors_of_taxonomic_articles.pkl and single_authors_of_taxonomic_articles.pkl.
Authors extracted from articles from selected countries. Results in data/processed/country_taxonomic_authors_no_duplicates.tsv.
<string>:30: DtypeWarning: Columns (6,7,8,9,10,13,16,21,22) have mixed types. Specify dtype option on import or set low_memory=False.