HorvathLab / NGS

Next-Gen Sequencing tools from the Horvath Lab
https://horvathlab.github.io/NGS/
MIT License
39 stars 16 forks source link

AssertionError #14

Closed Ilarius closed 1 year ago

Ilarius commented 1 year ago

Hello I am getting this error in the very beginning of the script

Read SNV data                                            ->|
[196859] Failed to execute script readCounts
Traceback (most recent call last):
  File "readCounts.py", line 298, in <module>
KeyError: 'ALT'
[196848] Failed to execute script scReadCounts
Traceback (most recent call last):
  File "scReadCounts.py", line 298, in <module>
  File "execute.py", line 30, in execute
AssertionError

My SNV file is a space delimited file starting like this:

chr1 56952155 C T
chr1 56956848 G T
chr1 56940964 C T
chr1 56929507 G A
chr1 56949678 T G

The pooled bam file has CB:Z tags:

A00203:382:HKCHNDMXY:1:1102:30544:9987  16      chr1    10269   3       18S71M1S        *       0       0       TAAACCTCTAACCCTAACACCCTAACCCTAACCCTACCCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAACCCCTAACCCC       F::,FF,FFFF:FFFFFFFFFFF:FFFFFF,FFFFF:FFF:F:FFFFFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:2  HI:i:2  AS:i:65 nM:i:2  RG:Z:PETRA_651883000902_GEX:0:1:HKCHNDMXY:1     RE:A:I  xf:i:0  CR:Z:ACAGAAATCCACGAAT    CY:Z:FFFFFFFFFFFFFFFF   CB:Z:ACAGAAATCCACGAAT-1 UR:Z:CCGCCGTGCGCA       UY:Z:FFFFFFFFFFFF       UB:Z:CCGCCGTGCGCA

End the script I used to run the analysis is the following:

scReadCounts -s "snv_scReadCounts.txt" -r "possorted_genome_bam.bam" -C "CellRanger_CB" -b "epithelial_pre_cells.txt" -t 1 -o "screadcounts_output.csv"

What is going wrong? I can't find the execute.py file on github

edwardsnj commented 1 year ago

Hi @Ilarius, thanks for reaching out. Your partial SNV file looks OK, perhaps the issue is later in the file. Based on the error message, it looks like at least one of the lines in your SNV file has less than four white-space separated items. Try the command with just the small snippet you posted to verify. The following awk command will print out the linenumber and the line itself of any line with less than 4 items.

awk 'NF < 4 {print NR":",$0}' snv_scReadCounts.txt

Let me know if this doesn't identify the problem.

Cheers!

Ilarius commented 1 year ago

hello! Yes the problem was that the file contained also deletions (white spaces) and small insertions (more than one nt). It can't work with those correct?