cfe-lab / proviral

0 stars 0 forks source link

Statistics generation #2

Closed donkirkby closed 3 years ago

donkirkby commented 3 years ago

From @dmacmillan's summary e-mail.

We want a script to count the following: a) total samples b) number of QC "passed" samples c) number of samples that failed due to not existing ("no_sequence" error) d) number of samples that failed due to not being HIV ("non_hiv" error) e) number of samples that failed due to any primer error ("no_primer" error) f) number of samples that failed due to low internal coverage ("low_internal_cov" error) g) number of samples that failed despite it being an HIV sequence ("hiv_but_failed" error) and we want to do this on a per-run basis, per-participant-ID basis, and on an overall basis for proviral runs. In the dev branch of the proviral pipeline I have removed the old logic as it should now rely on the outcome summary file to compute its numbers, I left a skeletal structure to make things easier.

donkirkby commented 3 years ago

Update on item g from Zabrina:

The “hiv_but_failed” error does not exist - rather, the final error category is “multiple contigs”.

donkirkby commented 3 years ago

After discussion, we decided to group the error counts into these columns:

Some errors require further investigation: