Open emmamendelsohn opened 4 years ago
@soride FYI this might affect you. Let me know if you have any issues. Also note if you are encountering any EIDITH issues now it may be related to these changes. You may need to install the latest version of the package, delete and re-download the db.
@emmamendelsohn You may already know this, but I’m using the new virus finding table that downloads for both P1 and P2. I noticed that there are duplicate GAINS4_TestID when you look across both P1 and P2 within just the WCS data. When I subset them into just P1 or just P2 there are no duplicates. The issue concerns merging and assuming there is only one GAINS4_TestID across P1 and P2. I’ve got a workaround but it seems strange to me that there would be duplicates? (I posted this to Tammie/Noam over email but saw Noam was out of office until March )
Thanks for letting me know @soride. I think Tammie is the best person to answer this, as the GAINS4_TestIDs are generated for the API. I had issues with inconsistent GAINS IDs related to a few pooled samples, but this sounds like something different. Let me know if you get to the bottom of this.
Tammie's response on email - 'Yes, the SequenceID's are repeated between P1 & P2, they are different tables in the database and thus that can happen. To make it unique you have to append the ProjectCode to the ID.'
Maybe related to this work around? I can no longer find these 2 records from P1> Peru in the new table. These are the only 2 positive results for Peru. Tammie sent me the direct API link to this table and I can see them there (copied below) but not in what shows up in R using the code above. https://predict2api.eidith.org/api/Extract/ExtractTestDataInterpreted?country=%27peru%27&P1Data=1
{"Project":"P1","GAINS4_TestID":622628,"GAINS4_SpecimenID":907327,"Country":"Peru","IsOutbreakTesting":"N/A","SiteName":"Parque Nacional Bahuaja Sonene - Chocolatillo","ConcurrentSamplingSite":"N/A","District":null,"StateProv":"Madre de Dios and Puno","SiteLatitude":-13.19,"SiteLongitude":-70.13,"EventLatitude":-13.19,"EventLongitude":-70.13,"EventDate":"2013-06-04","Season":"N/A","SeasonModelled":"Wet","SeasonModelledDeviation":0.935033431471698,"DurationDays":"N/A","EventName":"Parque Nacional Bahuaja Sonene - Chocolatillo-2013-06-04","AnimalID":"13FM0067","TaxaGroup":"Bats","AnimalClass":"free-ranging wild animal","ScientificName":null,"ScientificNameToLowestKnownRank":"Carollia perspicillata","TaxonomicRank":"species","CommonName":"Seba's Short-Tailed Bat","LocalName":"Murciélago frutero común","IDCertainty":"field ID certain","AnimalSampleDate":"2013-06-04","Sex":"Female","SpecimenID":"13FM00672.2","SpecimenType":"rectal swab","TestDate":"2014-02-01","LabName":"Biomedicina LAB, INS, Lima","TestType":"Conventional PCR","TestRequested":"Coronaviruses","TestRequestedProtocol":"Modified Watanabe et al, RdRp gene","TestResult":"Product for Sequencing","RealTime_PositiveControlValue1":"N/A","RealTime_PositiveControlValue2":"N/A","RealTime_InternalControlValue":"N/A","RealTime_CTValue":"N/A","LabName_Confirmation":"Cloning and Sequencing","ConfirmationTestDate":"N/A","ConfirmationResult":"Positive","TestStatus":"Interpretation completed","Virus":"strain of Trinidad/1FY2BA/2007","VirusGroup":"Trinidad/1FY2BA/2007","VirusStatus":"known","HumanThreat":"no","ViralFamily":"Coronaviridae","ViralGenus":"Alphacoronavirus","Interpretation":"This is a strain of the known alphacoronavirus Trinidad/1FY2BA/2007 (Accession Number EU769557) found in bats. There is no evidence at this time to suggest this virus poses a threat to human health.","GovernmentReportName":"N/A","DateGovtApprovedRelease":"2017-03-24","Organization":"Wildlife Conservation Society"}
{"Project":"P1","GAINS4_TestID":623236,"GAINS4_SpecimenID":977513,"Country":"Peru","IsOutbreakTesting":"N/A","SiteName":"Parque Nacional Bahauaja Sonene - La Nube","ConcurrentSamplingSite":"N/A","District":null,"StateProv":"Carabaya","SiteLatitude":-13.41,"SiteLongitude":-69.6,"EventLatitude":-13.41,"EventLongitude":-69.6,"EventDate":"2013-09-23","Season":"N/A","SeasonModelled":"Dry","SeasonModelledDeviation":-0.00833013060886076,"DurationDays":"N/A","EventName":"Parque Nacional Bahauaja Sonene - La Nube-2013-09-23","AnimalID":"13FM0176","TaxaGroup":"Bats","AnimalClass":"free-ranging wild animal","ScientificName":null,"ScientificNameToLowestKnownRank":"Desmodus rotundus","TaxonomicRank":"species","CommonName":"Common Vampire Bat","LocalName":"N/A","IDCertainty":"field ID certain","AnimalSampleDate":"2013-09-23","Sex":"Female","SpecimenID":"13FM01765.2","SpecimenType":"feces","TestDate":"2014-03-01","LabName":"Biomedicina LAB, INS, Lima","TestType":"Conventional PCR","TestRequested":"Coronaviruses","TestRequestedProtocol":"Modified Watanabe et al, RdRp gene","TestResult":"Product for Sequencing","RealTime_PositiveControlValue1":"N/A","RealTime_PositiveControlValue2":"N/A","RealTime_InternalControlValue":"N/A","RealTime_CTValue":"N/A","LabName_Confirmation":"Cloning and Sequencing","ConfirmationTestDate":"N/A","ConfirmationResult":"Positive","TestStatus":"Interpretation completed","Virus":"strain of Bat Coronavirus/KP817/Phy_dis/PAN/2011","VirusGroup":"Bat Coronavirus/KP817/Phy_dis/PAN/2011","VirusStatus":"known","HumanThreat":"no","ViralFamily":"Coronaviridae","ViralGenus":"Alphacoronavirus","Interpretation":"This a strain of the known alphacoronavirus Bat Coronavirus/KP817/Phy_dis/PAN/2011 (Accession Number JX731783) found in bats. There is no evidence at this time to suggest this virus poses a threat to human health.","GovernmentReportName":"N/A","DateGovtApprovedRelease":"2017-03-24","Organization":"Wildlife Conservation Society"}
Reinstall the package and try again. It may have been related to Peru (and other P1-only) countries missing from an internal function. I was just able to see these samples with ed2_get("TestDataInterpreted", p1_data = TRUE, postprocess = FALSE, country = "Peru")
Got it and I now see these results! And one quick note on reinstall and download - I got this warning below. GenbankAccessionNumber seems it is missing from the ed2_get patch approach too?
Warning message:
In ed2_process(data, endpoint2) :
Unexpected fields in TestDataInterpreted download: GenbankAccessionNumber. These fields will be dropped.
Re-install the eidith package and try again. If warning persists see ?ed_contact
I added GenbankAccessionNumber to the metadata, which should take care of the warning next time you download. No need to reinstall.
Yup all fixed!
The P1 virus extract is no longer available on the EIDITH API, and so we removed
ed_virus()
from this package. P1 viral data can be accessed usinged2_get()
and settingp1_data = TRUE
. For example:ed2_get("Test", p1_data = TRUE, postprocess = FALSE)
This will return a single test table that contains both P1 and P2 data as indicated by the
Project
field.Combined tables are available for "Event", "Animal", "Specimen", "Test", "TestDataInterpreted".
Note that
postprocess = FALSE
is currently required. This means that the column headers will contain capitalization, spaces, etc.Future updates:
ed_db_download()
to work on the combined data.