Closed CuzickA closed 2 years ago
@jseager7 are the strain lists generated from the latest PHI-base 4.12 release?
The PHI-Canto strain lists were last updated in April 2021, so I'm guessing not. I think the strain lists diverged from PHI-base a lot more than the species lists because there were lots of strains that weren't suitable for curation.
It would make sense to repurpose my PHI-base strain cleaning pipeline so that it can generate a PHI-Canto strain list from subsequent PHI-base releases, but that depends on how many more PHI-base v4.x releases are going to be made.
Any option could take some time because I'd have to manually compare the values between PHI-Canto and PHI-base v4.12. I can use scripts to speed up some comparisons but I'd still have to write rules for the strains that should be excluded or renamed, where synonyms should be added, etc.
Hmmm there will definitely be a PHI-base 4.13 release in ~May 2022 and possibly a 4.14 release in ~Nov 2022 dependant on the status of the PHI4 -> PHI5 data migration. MC curated data from 4.14 may be able to be loaded directly into PHI5.
For the PHI-Canto manuscript it would be nice to have all the lists up-to-date for the PHI-base 4.12 release.
Okay, I'll make sure that the new strains are merged before the manuscript is submitted.
I'll also make sure they're available in PHI-Canto by the time they're uploaded here.
The strain lists have been added by commit 36100324cea71c2cbae28bd4c7a679166ae21c26.
Required for moving forward with PHI-Canto roll out and manuscript preparation.