Closed s2hui closed 3 weeks ago
Hi @s2hui,
can you share the first few lines of your BAM file?
Hello,
Here are the first 5 lines. Please let me know if there is another more appropriate command to run. Thank you!
$ samtools view possorted_genome_bam.bam | head -n 5
NB552139:27:HKCYMBGXB:1:11112:4266:17168 272 1 10546 1 56M * 0 0 ATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTG AAEEEEEEEEA<EEEEEEEEEEEEEEEEEEEEEEEE/AEAEEAEEAE6EEEAAAAA NH:i:3 HI:i:3 AS:i:53 nM:i:1 RE:A:I li:i:0 BC:Z:TGACGCCC QT:Z:AAAAAEEE CR:Z:GCATTAGCATACTGTG CY:Z:AAAAAEEEEEEEEEEE CB:Z:GCATTAGCATACTGTG-1 UR:Z:ATTTTAGTGGGC UY:Z:EEEEEEEEEEEE UB:Z:ATTTTAGTGGGC RG:Z:counts_out_sample:0:1:HKCYMBGXB:1
NB552139:27:HKCYMBGXB:1:11306:12412:13291 272 1 10546 1 56M * 0 0 ATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTG EEEEE/EEEEEEAEEEEEEEEEE/EEEEEEEEEEEEEEE/EEEEEEEEEEEAAAAA NH:i:3 HI:i:3 AS:i:53 nM:i:1 RE:A:I li:i:0 BC:Z:TGACGCCC QT:Z:AAAAAEEE CR:Z:GCATTAGCATACTGTG CY:Z:AAAAAEEEEEEEEEEE CB:Z:GCATTAGCATACTGTG-1 UR:Z:ATTTTAGTGGGC UY:Z:EEEEEEEEEEEE UB:Z:ATTTTAGTGGGC RG:Z:counts_out_sample:0:1:HKCYMBGXB:1
NB552139:27:HKCYMBGXB:1:23203:8404:3397 256 1 11279 0 1S55M * 0 0 CGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCTTACTGTATA AAAAAEEEEEEEEEEEEEEEEEEEEEEE/EEEEAEEEEAAEEAEE<EAEEEEAAE/ NH:i:6 HI:i:3 AS:i:54 nM:i:0 RE:A:I li:i:0 BC:Z:GATTAGAT QT:Z:AAAAAEEE CR:Z:AATGGAACAGTAGAAT CY:Z:AAAAAEEEEEEEEEEE CB:Z:AATGGAACAGTAGAAT-1 UR:Z:ATATCCTATGTG UY:Z:EEEEEEEEEEEE UB:Z:ATATCCTATGTG RG:Z:counts_out_sample:0:1:HKCYMBGXB:1
NB552139:27:HKCYMBGXB:3:12402:19279:16040 256 1 11279 0 56M * 0 0 GCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCTTACTGTATAG AAAAAEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEE/E NH:i:6 HI:i:2 AS:i:55 nM:i:0 RE:A:I li:i:0 BC:Z:GATTAGAT QT:Z:AAAAAEEE CR:Z:ATCCACCTCGCTGTTC CY:Z:AAAAAEEEEEEEEEEE CB:Z:ATCCACCTCGCTGTTC-1 UR:Z:CCATATACGTGT UY:Z:EEEEEEEEEEEE UB:Z:CCATATACGTGT RG:Z:counts_out_sample:0:1:HKCYMBGXB:3
NB552139:27:HKCYMBGXB:2:23204:15697:17763 256 1 11310 0 56M * 0 0 CAGGGCCCTCTTGCTTACTGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGA AAAAAEEEEEEAE<EEEEEAEEEEEE6EAEE<E<AA<EAE6EEAA/EEEEEEAEEE NH:i:6 HI:i:3 AS:i:55 nM:i:0 RE:A:I li:i:0 BC:Z:ACCGTATG QT:Z:AAAAAEEE CR:Z:TCAAGCATCGGAGTAG CY:Z:AAAAAEEEEEEEEEEE CB:Z:TCAAGCATCGGAGTAG-1 UR:Z:CTTCGGTTTCCT UY:Z:EEEEEEEEEEEE UB:Z:CTTCGGTTTCCT RG:Z:counts_out_sample:0:1:HKCYMBGXB:2
Hi @bvaldebenitom,
I got the same error message and solved it by changing grep \"chr\" (in lines 292 and 293) by a string that's common to my chromosomes, which do not conform to the chr# format. Maybe adding the pattern to search for as a command line could solve this problem for those that do not have genomes with chromosomes named chr1, chr2, etc.
After that, I got the following error message:
Traceback (most recent call last):
File "SoloTE_pipeline.py", line 321, in
Which I solved by deleting the 3 instances of "stdout." on line 321 as genenumber, barcodenumber and allcounts_number are not channels any more, but strings.
This seems to work, but please correct me if it is not the right way to solve these problems.
In the meantime, I will continue with the downstream analysis of the results.
Thanks for this tool!! Cristian
@s2hui
the problem is that your BAM file sequence / chromosome names don't match those in the BED file.
For a quick fix, please run the following command to create a BED file:
awk 'BEGIN{FS=OFS="\t"}{gsub("chr","",$1); print $0}' Current_BED_file > NEW_BED_file
.
Then, remove all the files in the temp directory, and re-run SoloTE.
Hi @cche!
Thank you for sharing your result. In the next release, we will fix the "chr" issue. Glad that you solved it.
Can you share the information about your operating system, and Python versions? We have noticed that the "stdout" reference works in some OSs and not in others (Linux vs OSx for example).
To add to this thread, I encountered the same issue as @cche and could solve it the same way. Running a previous release (1.07 or 1.06) also works. I'm running on macOS Ventura and Python 3.9.5!
@bvaldebenitom
Thank you, I had noticed the BAM file sequence / chromosome name mismatch and had updated the bed file accordingly but did not delete the temp files before running.
After deleting the temp files, it appears to be working now!
Thanks again for your help!
Hi @bvaldebenitom,
I use Rocky linux and ubuntu with python 3.10.11 installed with conda.
It is very strange that .stdout works at all in other OSs as you are assigning the variables that you used to store the CompletedProcess object, to the output of re.sub() which is a string and does not have a .stdout attribute at all.
I hope you solve these OS differences so that your code is stable everywhere.
Thanks again for this tool. The downstream analyses look great!
Hello,
I am running into a similar error as documented in #14 . I created another issue because I noticed the thread in #14 is ongoing and I didn't want to add another thread as it might be confusing:
I'm using the versions of dependencies below:
I followed the issue thread in #14 and ran the developmental version of the pipeline script (SoloTE_developmental_20230503.tar.gz).
An output directory was created with barcode, features and matrix files, however with the above errors.
I also noticed the following files in the temp directory are empty:
-rw-r--r-- 1 s2hui group 0 May 12 08:46 sample_locustes_2.txt -rw-r--r-- 1 s2hui group 0 May 12 08:44 sample_locustes.txt -rw-r--r-- 1 s2hui group 0 May 11 22:15 sample_nogenes_overlappingtes.bed -rw-r--r-- 1 s2hui group 0 May 11 22:15 sample_selectedtes.bed -rw-r--r-- 1 s2hui group 0 May 12 08:46 sample_subftes_2.txt -rw-r--r-- 1 s2hui group 0 May 12 08:44 sample_subftes.txt
I wonder if I also have an issue with my TE file? I only have 4 fields in my file as I don't have score values.
Thanks alot for your help, @s2hui