Closed albertvillanova closed 2 years ago
The PMC Article datasets are:
pmc
org: https://huggingface.co/datasets/pmc/open_accessMaybe we should create a Hub organization for all these datasets: pmc
or pubmed_central
.
what's the difference between this and https://github.com/bigscience-workshop/data_tooling/issues/74 ?
@yjernite:
This is done, posted in #74: here.
Done: https://huggingface.co/datasets/bigscience-catalogue-lm-data/lm_en_pmc
Sample:
{
'text': "==== Front\nPLoS BiolPLoS BiolpbioplosbiolPLoS Biology1544-91731545-7885Public Library of Science San Francisco, USA 10.1371/journal.pbio.0000005Research ArticleGenetics/Genomics/Gene TherapyInfectious DiseasesMicrobiologyPlasmodiumThe Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum\n P. falciparum IDC TranscriptomeBozdech Zbynek ..."
'meta': "{'pmid': 12929205}"
}
Thanks @lvwerra.