Open donkirkby opened 4 years ago
@dmacmillan, the code to test is currently on the MultiuseDocker branch.
I've run the following samples through MiCall via Docker on Windows 10 Home successfully!
Sample | Time (m) |
---|---|
SRR11593354 | 192 |
That's great, @dmacmillan! Have you found a consensus sequence to compare it to?
The sample ID is "NRW-011" So try GISAID Accession# "EPI_ISL_414507"
I am waiting on a confirmation email so that I can search via GISAID
I found another sample/consensus sequence, I'll keep track of the one's that I have found in this comment:
Sample | Consensus | Time (m) |
---|---|---|
SRR11593354 | EPI_ISL_414507 | 192 |
SRR11578347 | EPI_ISL_427026 | Not run |
SRR11578346 | EPI_ISL_426898 | Not run |
SRR10903401 | EPI_ISL_414507 | Not run |
Pre-existing Table Run | Compared to | Differences |
---|---|---|
SRR11593354_1.fastq | EPI_ISL_414507 | 0 mismatches, 0 missing, and 648 added out of 29225. |
SRR11593355_1.fastq | EPI_ISL_414574 | 0 mismatches, 0 missing, and 435 added out of 29438. |
SRR11593356_1.fastq | EPI_ISL_414509 | 1 mismatches, 0 missing, and 91 added out of 29782. |
SRR11593357_1.fastq | EPI_ISL_414508 | 0 mismatches, 0 missing, and 395 added out of 29490. |
SRR11593358_1.fastq | EPI_ISL_414506 | 0 mismatches, 0 missing, and 887 added out of 28933. |
SRR11593359_1.fastq | EPI_ISL_414505 | 0 mismatches, 0 missing, and 92 added out of 29782. |
SRR11593360_1.fastq | EPI_ISL_414504 | 0 mismatches, 0 missing, and 447 added out of 29426. |
SRR11593361_1.fastq | EPI_ISL_414499 | 2 mismatches, 0 missing, and 144 added out of 29782. |
SRR11593362_1.fastq | EPI_ISL_414498 | 0 mismatches, 0 missing, and 384 added out of 29490. |
SRR11593364_1.fastq | EPI_ISL_414497 | 0 mismatches, 0 missing, and 65 added out of 29779. |
SRR11593365_1.fastq | EPI_ISL_413488 | 10 mismatches, 0 missing, and 145 added out of 29746. |
SRR11578341 | EPI_ISL_426901 | 2 mismatches, 1 missing, and 617 added out of 29249. |
SRR11578342 | EPI_ISL_426900 | 1 mismatches, 0 missing, and 398 added out of 29286. |
SRR11578343 | EPI_ISL_426899 | 0 mismatches, 0 missing, and 429 added out of 29462. |
SRR11578344 | EPI_ISL_426899 | 15 mismatches, 2 missing, and 414 added out of 29462. |
SRR11578345 | EPI_ISL_426656 | 8 mismatches, 17 missing, and 398 added out of 29498. |
SRR11578346 | EPI_ISL_426898 | 0 mismatches, 0 missing, and 488 added out of 29315. |
SRR11578347 | EPI_ISL_427026 | 0 mismatches, 0 missing, and 148 added out of 29676. |
SRR11578348 | EPI_ISL_427025 | 1 mismatches, 1 missing, and 452 added out of 29411. |
SRR11578349 | EPI_ISL_427024 | 1 mismatches, 0 missing, and 564 added out of 29301. |
SRR10903401-SARS_S1 | MN988669.1 | Very good: 12 mismatches in the first 24 bases under low coverage, and 21 extra A's at the end out of 29881. |
SRR10903402-SARS_S2 | MN988668.1 | Almost perfect: 21 extra A's at the end out of 29881. |
SRR11092056-SARS_S3 | MN996530 | Bad: 899 mismatches, 17761 missing, and 217 added out of 29854. |
SRR11092057-SARS_S4 | MN996528.1 | Very good: 4 mismatches, 33 missing, and 12 added out of 29891. Missing 14 at the start, a gap of 15 with no coverage at 5397, plus 4 single gaps of no coverage within 20 bases. The mismatches are all in low coverage, 3 are mixtures when coverage is 2. 12 extra A's at the end.. |
SRR11092058-SARS_S5 | MN996527.1 | Bad: lots of sections with no coverage. 38 mismatches, 7606 missing, and 26 added out of 29825. |
SRR11092064-SARS_S6 | MN996531.1 | Bad: lots of sections with no coverage. 24 mismatches, 4667 missing, and 33 added out of 29857. |
SRR11140744-SARS_S7 | EPI_ISL_408670 | Almost perfect: 28 missing from the start, and poly-A tail replaced with ACAGATATATACGCC out of 29879. |
SRR11140746-SARS_S8 | EPI_ISL_408670 | Almost perfect: poly-A tail replaced with AATAWMAACAAACAGAGCCTAAAAAGGACAAAA4 out of 29879. |
SRR11140748-SARS_S9 | EPI_ISL_408670 | Almost perfect: 6 missing from poly-A tail out of 29879. |
SRR11140750-SARS_S10 | EPI_ISL_408670 | Almost perfect: 9 missing from the start, and poly-A tail replaced with ACAATTGCAACAATC out of 29879. |
SRR11177792-SARS_S11 | MT072688 | Almost perfect: 57 added out of 29811. A few added to start, most added at end: AGTGCTGAG + poly-A tail. |
SRR11314339-SARS_S12 | MT192765 | Almost perfect: 38 added out of 29829. A few added to start, most added at end: CCATGTGATTTTAATAG + poly-A tail. |
@cbrumme @donkirkby I couldn't find a reference for sample SRR11578344
, any ideas? If not I can find another.
After finishing the SARS-CoV-2 support in #549, do more extensive testing with published sample data. List of samples to download and the toolkit to download with.
Find more samples from SRA by searching for "Severe acute respiratory syndrome-related coronavirus"[orgn:__txid694009]. You can filter by platform, and there are currently 466 Illumina records.
It can be tricky to find the published consensus sequences for a sample. I registered for GISAID and found Accession EPI_ISL_408670, but it took me a while to figure out that the descriptions in the SRA abstract for SRR11140746 (SARS-CoV-2/2019-nCoV/USA-WI-1/2020) loosely match the virus name in GISAID for EPI_ISL_408670 (hCoV-19/USA/WI1/2020).
Art's advice: