Closed cannin closed 3 months ago
There 65 studies don't have pmid. no_pmid_list.json
Out of how many? You still have many and should proceed with those.
65 out of 411.
So far, I loaded 250 pubmed papers using xml loader. I found there are 40 studies have no pmcid, and 28 studies have pmcid but no xml. Do you have any questions to ask chatbot about pubmed papers, this can help me to check the accuracy and make some change if needed.
I found there are 18 studies have same pmid list, and I am not sure if these pmid data is correct or not. Also, some of pmids in the list appeared 32 times or more. The pmid list is : "29625048,29596782,29622463,29617662,29625055,29625050,29617662,30643250,32214244,29625049,29850653".
A simple question to verify would be how many samples in study X? This could come from LangChain OpenAPI or the publication.
Hi Augustin, I found pmid for those 5 studies missing pmid, but I am not sure about this pmid, could you please help me to double check when you are available?
{ "name": "Gastrointestinal Stromal Tumors (MSK, Clin Cancer Res 2023)",
"description": "Targeted sequencing of 469 gastrointestinal stromal tumors and their matched normals via MSK-IMPACT.",
"publicStudy": true,
"pmid": "36971786",
"groups": "",
"status": 0,
"importDate": "2023-12-07 18:44:10",
"allSampleCount": 469,
"readPermission": true,
"studyId": "gist_msk_2023",
"cancerTypeId": "gist",
"referenceGenome": "hg19"}.
I think the PMID is wrong. Probably this one: https://pubmed.ncbi.nlm.nih.gov/37477937/ (talk with Ruslan about fixing it; not the highest priority).
Got it, thank you. I am working on testing pubmed chatbot and its pdf loader.
Modify the loader to work with PMC.
Example URL
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMC3898398&retmode=xml
Use code from here to extract the text: extract_text()
https://gist.github.com/cannin/f4c1c21926a21f8a38de577ca2f0fc4c