Open akikuno opened 3 hours ago
First, it might be a good idea to perform anomaly detection on MIDSV during the preprocessing stage for control, and exclude abnormal sequences (including samples with many N
s).
As for the sample, it may be effective to use the abnormal sequences found in control as a reference and exclude similar sequences from the sample.
Output reads with many sequence errors in the control as BAM/control/sequence_errors.bam
, and exclude them from the analysis.
Additionally, exclude reads from samples that are similar to those in BAM/control/sequence_errors.bam
from the analysis as well, outputting them as BAM/sample/sequence_errors.bam
.
📋 Description
Currently, in
clustering.score_handler.annotate_score
, samples with a large number ofN
are being incorrectly judged as having no mismatches. SinceN
is clearly not a match, it must be accounted for in the scoring.💬 Current Behavior
No response
🎯 Expected Behavior
No response
⚠️ Error message
No response
🔍 Environment
📎 Anything else?
No response