Closed ArtPoon closed 1 year ago
estimate-freqs.R
script worked fine with the example data from the README
The user was running into an issue because they were likely using the original constellation files from the cov-lineages
repo or a file that was not modified in our forked repo.
We forked cov-lineages/constellations
and modified sites
in select constellation files to display the nucleotide substitution associated with the AA substitution:
The following files were modified:
constellations/constellations/definitions/cB.1.617.2.json
constellations/constellations/definitions/cBA.1.json
constellations/constellations/definitions/cBA.2.json
constellations/constellations/definitions/cBA.2.75.json
constellations/constellations/definitions/cBA.4.json
constellations/constellations/definitions/cBA.5.json
constellations/constellations/definitions/cBE.1.json
constellations/constellations/definitions/cBQ.1.1.json
That seems to be the case. Just following the links from your main gromstole repo to constellations, I end up viewing different files like this one below (not edited to have the "aa" and "nt" distinctions) rather than the ones linked above. https://github.com/PoonLab/constellations/blob/47418a5605501552e0793fe02e5a3fffd010dc2c/constellations/definitions/cBA.5.json Do you happen to have a script that can convert the "sites" details to this other format? That would be helpful for any of cov-lineages' constellations jsons that y'all haven't modified yet.
Really, what I'm hoping to do (the reason I asked the above question) is to have a way of using gromstole with the most up-to-date constellations on wastewater samples that span from early strains to current ones. I'll be comparing its lineage predictions with other deconvolution tools, so I'd rather not be limited to just the few lineages with constellations files that y'all have manually prepared. If that's not really feasible, please let me know, as it will mean gromstole isn't a tool I should be considering.
Hi @skunklem - we stopped updating constallations a while ago. The original purpose of Gromstole was to rapidly extract mutation frequencies from wastewater NGS data. It still does a pretty decent job of doing this. However, we were then asked to provide variant frequency estimates to distinguish between Delta and Omicron in wastewater. The binomial regression method did a reasonable job of this. Now that there are hundreds of variants that are only slightly different from each other, however, gromstole is no longer an appropriate tool for calling variant frequencies and I would direct you to one of the several deconvolution methods that have since been released.
That makes sense. Thanks for the insights.
Unfortunately the constellation files are no longer being used in our ww processing, and converting the constellations is not readily automated because we had been manually selecting a subset of mutations to "uniquely define" a given variant. (The latter was not sustainable, which is why we switched to a deconvolution method, i.e., Freyja). Closing as a wontfix issue.
Originally posted by @skunklem in https://github.com/PoonLab/gromstole/issues/76#issuecomment-1648816796