ecohealthalliance / eidith

⚠️ DEPRECATED ⚠️ Data is now at data.usaid.gov/Global-Health-Security-in-Development-GHSD-/PREDICT-Emerging-Pandemic-Threats-Project/tqea-hwmr
Other
2 stars 2 forks source link

ed_virus() removed - workaround #109

Open emmamendelsohn opened 4 years ago

emmamendelsohn commented 4 years ago

The P1 virus extract is no longer available on the EIDITH API, and so we removed ed_virus() from this package. P1 viral data can be accessed using ed2_get() and setting p1_data = TRUE. For example:

ed2_get("Test", p1_data = TRUE, postprocess = FALSE)

This will return a single test table that contains both P1 and P2 data as indicated by the Project field.

Combined tables are available for "Event", "Animal", "Specimen", "Test", "TestDataInterpreted".

Note that postprocess = FALSE is currently required. This means that the column headers will contain capitalization, spaces, etc.

Future updates:

emmamendelsohn commented 4 years ago

@soride FYI this might affect you. Let me know if you have any issues. Also note if you are encountering any EIDITH issues now it may be related to these changes. You may need to install the latest version of the package, delete and re-download the db.

soride commented 4 years ago

@emmamendelsohn You may already know this, but I’m using the new virus finding table that downloads for both P1 and P2. I noticed that there are duplicate GAINS4_TestID when you look across both P1 and P2 within just the WCS data. When I subset them into just P1 or just P2 there are no duplicates. The issue concerns merging and assuming there is only one GAINS4_TestID across P1 and P2. I’ve got a workaround but it seems strange to me that there would be duplicates? (I posted this to Tammie/Noam over email but saw Noam was out of office until March )

emmamendelsohn commented 4 years ago

Thanks for letting me know @soride. I think Tammie is the best person to answer this, as the GAINS4_TestIDs are generated for the API. I had issues with inconsistent GAINS IDs related to a few pooled samples, but this sounds like something different. Let me know if you get to the bottom of this.

soride commented 4 years ago

Tammie's response on email - 'Yes, the SequenceID's are repeated between P1 & P2, they are different tables in the database and thus that can happen. To make it unique you have to append the ProjectCode to the ID.'

soride commented 4 years ago

Maybe related to this work around? I can no longer find these 2 records from P1> Peru in the new table. These are the only 2 positive results for Peru. Tammie sent me the direct API link to this table and I can see them there (copied below) but not in what shows up in R using the code above. https://predict2api.eidith.org/api/Extract/ExtractTestDataInterpreted?country=%27peru%27&P1Data=1

{"Project":"P1","GAINS4_TestID":622628,"GAINS4_SpecimenID":907327,"Country":"Peru","IsOutbreakTesting":"N/A","SiteName":"Parque Nacional Bahuaja Sonene - Chocolatillo","ConcurrentSamplingSite":"N/A","District":null,"StateProv":"Madre de Dios and Puno","SiteLatitude":-13.19,"SiteLongitude":-70.13,"EventLatitude":-13.19,"EventLongitude":-70.13,"EventDate":"2013-06-04","Season":"N/A","SeasonModelled":"Wet","SeasonModelledDeviation":0.935033431471698,"DurationDays":"N/A","EventName":"Parque Nacional Bahuaja Sonene - Chocolatillo-2013-06-04","AnimalID":"13FM0067","TaxaGroup":"Bats","AnimalClass":"free-ranging wild animal","ScientificName":null,"ScientificNameToLowestKnownRank":"Carollia perspicillata","TaxonomicRank":"species","CommonName":"Seba's Short-Tailed Bat","LocalName":"Murciélago frutero común","IDCertainty":"field ID certain","AnimalSampleDate":"2013-06-04","Sex":"Female","SpecimenID":"13FM00672.2","SpecimenType":"rectal swab","TestDate":"2014-02-01","LabName":"Biomedicina LAB, INS, Lima","TestType":"Conventional PCR","TestRequested":"Coronaviruses","TestRequestedProtocol":"Modified Watanabe et al, RdRp gene","TestResult":"Product for Sequencing","RealTime_PositiveControlValue1":"N/A","RealTime_PositiveControlValue2":"N/A","RealTime_InternalControlValue":"N/A","RealTime_CTValue":"N/A","LabName_Confirmation":"Cloning and Sequencing","ConfirmationTestDate":"N/A","ConfirmationResult":"Positive","TestStatus":"Interpretation completed","Virus":"strain of Trinidad/1FY2BA/2007","VirusGroup":"Trinidad/1FY2BA/2007","VirusStatus":"known","HumanThreat":"no","ViralFamily":"Coronaviridae","ViralGenus":"Alphacoronavirus","Interpretation":"This is a strain of the known alphacoronavirus Trinidad/1FY2BA/2007 (Accession Number EU769557) found in bats. There is no evidence at this time to suggest this virus poses a threat to human health.","GovernmentReportName":"N/A","DateGovtApprovedRelease":"2017-03-24","Organization":"Wildlife Conservation Society"}

{"Project":"P1","GAINS4_TestID":623236,"GAINS4_SpecimenID":977513,"Country":"Peru","IsOutbreakTesting":"N/A","SiteName":"Parque Nacional Bahauaja Sonene - La Nube","ConcurrentSamplingSite":"N/A","District":null,"StateProv":"Carabaya","SiteLatitude":-13.41,"SiteLongitude":-69.6,"EventLatitude":-13.41,"EventLongitude":-69.6,"EventDate":"2013-09-23","Season":"N/A","SeasonModelled":"Dry","SeasonModelledDeviation":-0.00833013060886076,"DurationDays":"N/A","EventName":"Parque Nacional Bahauaja Sonene - La Nube-2013-09-23","AnimalID":"13FM0176","TaxaGroup":"Bats","AnimalClass":"free-ranging wild animal","ScientificName":null,"ScientificNameToLowestKnownRank":"Desmodus rotundus","TaxonomicRank":"species","CommonName":"Common Vampire Bat","LocalName":"N/A","IDCertainty":"field ID certain","AnimalSampleDate":"2013-09-23","Sex":"Female","SpecimenID":"13FM01765.2","SpecimenType":"feces","TestDate":"2014-03-01","LabName":"Biomedicina LAB, INS, Lima","TestType":"Conventional PCR","TestRequested":"Coronaviruses","TestRequestedProtocol":"Modified Watanabe et al, RdRp gene","TestResult":"Product for Sequencing","RealTime_PositiveControlValue1":"N/A","RealTime_PositiveControlValue2":"N/A","RealTime_InternalControlValue":"N/A","RealTime_CTValue":"N/A","LabName_Confirmation":"Cloning and Sequencing","ConfirmationTestDate":"N/A","ConfirmationResult":"Positive","TestStatus":"Interpretation completed","Virus":"strain of Bat Coronavirus/KP817/Phy_dis/PAN/2011","VirusGroup":"Bat Coronavirus/KP817/Phy_dis/PAN/2011","VirusStatus":"known","HumanThreat":"no","ViralFamily":"Coronaviridae","ViralGenus":"Alphacoronavirus","Interpretation":"This a strain of the known alphacoronavirus Bat Coronavirus/KP817/Phy_dis/PAN/2011 (Accession Number JX731783) found in bats. There is no evidence at this time to suggest this virus poses a threat to human health.","GovernmentReportName":"N/A","DateGovtApprovedRelease":"2017-03-24","Organization":"Wildlife Conservation Society"}

emmamendelsohn commented 4 years ago

Reinstall the package and try again. It may have been related to Peru (and other P1-only) countries missing from an internal function. I was just able to see these samples with ed2_get("TestDataInterpreted", p1_data = TRUE, postprocess = FALSE, country = "Peru")

soride commented 4 years ago

Got it and I now see these results! And one quick note on reinstall and download - I got this warning below. GenbankAccessionNumber seems it is missing from the ed2_get patch approach too?

Warning message:
In ed2_process(data, endpoint2) :
  Unexpected fields  in TestDataInterpreted download: GenbankAccessionNumber. These fields will be dropped.
Re-install the eidith package and try again. If warning persists see ?ed_contact
emmamendelsohn commented 4 years ago

I added GenbankAccessionNumber to the metadata, which should take care of the warning next time you download. No need to reinstall.

soride commented 4 years ago

Yup all fixed!