Open parlar opened 4 years ago
Hello,
Our plan during the spring is to do some development on chanjo, and one goal would be to facilitate storage of more granular data, e.g. exons.
Did you do the test using mongodb or sql-database? We have also been thinking of using mongodb as backend, however it would be necessary to assess the performance of this vs sql.
Hi @parlar , there is a PR for using a mongodb backend open in #202 , however to say when we have time to get somewhere with this is tricky at the moment. We are hiring some people now and we hope to start developing chanjo again soon. Can not give any time frame now unfortunately
Thanks!
On Mon, Apr 20, 2020 at 1:59 PM Måns Magnusson notifications@github.com wrote:
Hi @parlar https://github.com/parlar , there is a PR for using a mongodb backend open in #202 https://github.com/Clinical-Genomics/chanjo/pull/202 , however to say when we have time to get somewhere with this is tricky at the moment. We are hiring some people now and we hope to start developing chanjo again soon. Can not give any time frame now unfortunately
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Clinical-Genomics/chanjo/issues/209#issuecomment-616506146, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYQ2ULS4WUV52OVP4KKBGLRNQ2IXANCNFSM4KMVRIMA .
-- Pär Larsson, PhD Clinical scientist, Bioinformatician Laboratory Medicine, Clinical Genetics / Pathology Umeå University Hospital 901 87 Umeå par.g.larsson@vll.se par.larsson@medbio.umu.se +46 90 785 2802
Hi,
I have not actually used chanjo for coverage reports but as I recall it provides reports of "completeness" on transcript or gene level.
Storing coverage data is tricky business since inclusion of too detailed information (per base) would quickly eat up a lot of space. However, some more granularity might still be useful for assessing the sequencing quality in different regions.
I have two questions.
Do you think it would be feasible to provide completeness info on exon-level? I made some quick tests with a WGS dataset using all exons for all ensembl transcripts and 4 completeness levels. The resulting data table amounted to 12 Mb compressed and 63 Mb uncompressed. Admittedly quite alot but it could reduced significantly more if, for example, CCDS was used instead. The size would also be reduced by using an SQL database if the data is sufficiently normalized.
In my mind, however, it would be a good thing if coverage data could be included directly into the scout system. But then it would also be convenient if data was stored in MongoDB, which though prevents the use JOINs and normalized data.
Do you have any thoughts on this?