The Python and R scripts emit several tabular (TSV) outputs, and there is some significant duplication between them. Yet the report generation scripts depends on an .Rdata input that captures the R environment with both the TSV contents and additional dataframes that are not serialized elsewhere.
Review the existing TSVs and dataframes used for report generation, and change the scripts to emit TSVs following a more practical tabular schema. Those tables may include:
Sample ID and other relevant input metadata (see #12)
Read alignment info relative to the vector annotation
Sequence variants/errors from CIGAR strings
Vector genome type/subtype labels currently computed within the R, e.g. "snapback", "other"
The Python and R scripts emit several tabular (TSV) outputs, and there is some significant duplication between them. Yet the report generation scripts depends on an .Rdata input that captures the R environment with both the TSV contents and additional dataframes that are not serialized elsewhere.
Review the existing TSVs and dataframes used for report generation, and change the scripts to emit TSVs following a more practical tabular schema. Those tables may include: