Aligning pair end reads: IndexError: List index out of range.

amitfenn commented 4 years ago

Thank you for this project, Weilong. It's awesome. I have however reached a barrier I can't seem to cross and need your help.

$python /data/home/users/afenn/BSSeeker/bs_seeker2-align.py --aligner=bowtie2 -p /data/home/users/afenn/anaconda3/envs/py2/bin/ -1 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq" -2 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered02.fastq" -g /data/home/users/afenn/FAME/GCF_000001405.39_GRCh38.p13_genomic.fna -o "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32L003.bam" -u "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003__unmapped.fastq" -d /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/ --bt2--end-to-end --temp_dir=/localscratch/afenn

BS-Seeker2 v2.1.8 - Oct. 30, 2018

[2019-10-06 15:53:41] Mode: Bowtie2, end-to-end alignment [2019-10-06 15:53:41] Filter for tag XS: #(mCH)/#(all CH)>50.00% and #(mCH)>5 [2019-10-06 15:53:41] Temporary directory: /localscratch/afenn/bs_seeker2_C1_S32L003.bam_-bowtie2-e2e-TMP-zxvFuE [2019-10-06 15:53:41] Reduced Representation Bisulfite Sequencing: False [2019-10-06 15:53:41] Pair end [2019-10-06 15:53:41] Aligner command: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --quiet -D 50 --end-to-end --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -L 15 --score-min L,-0.6,-0.6 -X 500 --fr -x %(reference_genome)s -f -1 %(input_file_1)s -2 %(input_file_2)s -S %(output_file)s [2019-10-06 15:53:41] ---------------------------------------------- [2019-10-06 15:53:41] Filename for 1st mate: /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq [2019-10-06 15:53:41] Filename for 2nd mate: /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered02.fastq [2019-10-06 15:53:41] The first base (for mapping): 1 [2019-10-06 15:53:41] The last base (for mapping): 200 [2019-10-06 15:53:41] Path for short reads aligner: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --quiet -D 50 --end-to-end --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -L 15 --score-min L,-0.6,-0.6 -X 500 --fr -x %(reference_genome)s -f -1 %(input_file_1)s -2 %(input_file_2)s -S %(output_file)s [2019-10-06 15:53:41] Reference genome library path: /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2 [2019-10-06 15:53:41] Directional library [2019-10-06 15:53:41] Number of mismatches allowed: 4 [2019-10-06 15:53:41] -------------------------------- [2019-10-06 16:00:46] Start reading and trimming the input sequences Detected data format: fastq Traceback (most recent call last): File "/data/home/users/afenn/BSSeeker/bs_seeker2-align.py", line 469, in options.Output_unmapped_hit File "/data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py", line 715, in bs_pair_end seq = l[0] IndexError: list index out of range

I've tried rebuilding the index as well, but the error persists. Pysam version = 0.7.6

Can you help?

guoweilong commented 4 years ago

What is the version of your python?

And what will the result be for the following command:

python /data/home/users/afenn/BSSeeker/bs_seeker2-align.py --aligner=bowtie2 -p /data/home/users/afenn/anaconda3/envs/py2/bin/ -1 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq" -2 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered02.fastq" -g GCF_000001405.39_GRCh38.p13_genomic.fna -o "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32L003.bam" -u "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003__unmapped.fastq" --bt2--end-to-end --temp_dir=/localscratch/afenn

Best, Weilong

amitfenn commented 4 years ago

Python 2.7

(py2) $ python /data/home/users/afenn/BSSeeker/bs_seeker2-align.py --aligner=bowtie2 -p /data/home/users/afenn/anaconda3/envs/py2/bin/ -1 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq" -2 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered02.fastq" -g GCF_000001405.39_GRCh38.p13_genomic.fna -o "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32L003.bam" -u "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003__unmapped.fastq" --bt2--end-to-end --temp_dir=/localscratch/afenn

 BS-Seeker2 v2.1.8 - Oct. 30, 2018

[2019-10-07 08:48:35] Mode: Bowtie2, end-to-end alignment [2019-10-07 08:48:35] Filter for tag XS: #(mCH)/#(all CH)>50.00% and #(mCH)>5 [2019-10-07 08:48:35] Temporary directory: /localscratch/afenn/bs_seeker2_C1_S32L003.bam_-bowtie2-e2e-TMP-SynbUk [2019-10-07 08:48:35] Reduced Representation Bisulfite Sequencing: False [2019-10-07 08:48:35] Pair end [2019-10-07 08:48:35] Aligner command: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --quiet -D 50 --end-to-end --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -L 15 --score-min L,-0.6,-0.6 -X 500 --fr -x %(reference_genome)s -f -1 %(input_file_1)s -2 %(input_file_2)s -S %(output_file)s [2019-10-07 08:48:35] ---------------------------------------------- [2019-10-07 08:48:35] Filename for 1st mate: /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq [2019-10-07 08:48:35] Filename for 2nd mate: /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered02.fastq [2019-10-07 08:48:35] The first base (for mapping): 1 [2019-10-07 08:48:35] The last base (for mapping): 200 [2019-10-07 08:48:35] Path for short reads aligner: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --quiet -D 50 --end-to-end --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -L 15 --score-min L,-0.6,-0.6 -X 500 --fr -x %(reference_genome)s -f -1 %(input_file_1)s -2 %(input_file_2)s -S %(output_file)s

[2019-10-07 08:48:35] Reference genome library path: /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2 [2019-10-07 08:48:35] Directional library [2019-10-07 08:48:35] Number of mismatches allowed: 4 [2019-10-07 08:48:35] -------------------------------- [2019-10-07 08:55:37] Start reading and trimming the input sequences Detected data format: fastq Traceback (most recent call last): File "/data/home/users/afenn/BSSeeker/bs_seeker2-align.py", line 469, in options.Output_unmapped_hit File "/data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py", line 715, in bs_pair_end seq = l[0] IndexError: list index out of range

guoweilong commented 4 years ago

Are the two fastq files good?

-1 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq" -2 "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered02.fastq"

From the code, it seems the program found that the 2nd file is empty, or not match with 1st fastq file?

Weilong

amitfenn commented 4 years ago

Hey Weilong,

I just double-checked. The Fastqc files check out alright and the second file is 40GB, so it's not empty. These are sequences from the NovaSeq Platform. Do you think that has anything to do with the error?

Thanks for your support so far.

guoweilong commented 4 years ago

It is strange... I guess, if you run in mapping the reads in single-end mode, such as the following: https://github.com/BSSeeker/BSseeker2#qa62

Weilong

amitfenn commented 4 years ago

The index error persists, even on a single read :(

(py2)$ python /data/home/users/afenn/ BSSeeker/bs_seeker2-align.py --aligner=bowtie2 -p /data/home/users/afenn/anacond a3/envs/py2/bin/ -i "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L 003_pcrfiltered01.fastq" -g /data/home/users/afenn/FAME/GCF_000001405.39_GRCh38.p13_genomic.fna -o "/data/ho me/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32L003.bam" -u "/data/home/user s/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003__unmapped.fastq" --temp_dir=/loc alscratch/afenn

 BS-Seeker2 v2.1.8 - Oct. 30, 2018

[2019-10-07 14:47:43] Mode: Bowtie2, local alignment [2019-10-07 14:47:43] Filter for tag XS: #(mCH)/#(all CH)>50.00% and #(mCH)>5 [2019-10-07 14:47:43] Temporary directory: /localscratch/afenn/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-ngk9ZN [2019-10-07 14:47:43] Reduced Representation Bisulfite Sequencing: False [2019-10-07 14:47:43] Single end [2019-10-07 14:47:43] Aligner command: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x %(reference_genome)s -f -U %(input_file)s -S %(output_file)s [2019-10-07 14:47:43] ---------------------------------------------- [2019-10-07 14:47:43] Read filename: /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq [2019-10-07 14:47:43] The first base (for mapping): 1 [2019-10-07 14:47:43] The last base (for mapping): 200 [2019-10-07 14:47:43] Path for short reads aligner: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x %(reference_genome)s -f -U %(input_file)s -S %(output_file)s

[2019-10-07 14:47:43] Reference genome library path: /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2 [2019-10-07 14:47:43] Directional library [2019-10-07 14:47:43] Number of mismatches allowed: 4 [2019-10-07 14:47:43] -------------------------------- [2019-10-07 14:47:45] Start reading and trimming the input sequences Traceback (most recent call last): File "/data/home/users/afenn/BSSeeker/bs_seeker2-align.py", line 412, in options.Output_unmapped_hit File "/data/home/users/afenn/BSSeeker/bs_align/bs_single_end.py", line 530, in bs_single_end seq = l[0] IndexError: list index out of range

guoweilong commented 4 years ago

What about the following command, by removing ""?

python /data/home/users/afenn/ BSSeeker/bs_seeker2-align.py --aligner=bowtie2 -p /data/home/users/afenn/anacond a3/envs/py2/bin/ -i /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L 003_pcrfiltered01.fastq -g GCF_000001405.39_GRCh38.p13_genomic.fna -o /data/ho me/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32L003.bam

and can you paste the result of following command?

head /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L 003_pcrfiltered01.fastq

Weilong

amitfenn commented 4 years ago

$ head /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq @A00653:13:HJ3NGDSXX:3:1101:1479:1016 1:N:0:AGCGATAG+TAATCTTA TTTTAATAGTGTAGGAAGTTGAATAATTTATGAAGGAGAGGGGTTAGGGTTGATTTGGGAGGATTTTATTGGTGTGGGGGTTTTGTATGATTATGGGTGTTGATTAGTAGTAGTTATTGGTTGAATATTGTTTGTT + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF,FFFFFFFFFFFFFFF,,FFFFFFFFFF:FFFF @A00653:13:HJ3NGDSXX:3:1101:1678:1016 1:N:0:AGCGATAG+TAATCTTA TAATAATAATAAAAAATTTTTTAAAAGTTATAATAATGGTGATAGTTAATATTTAGTAATGTTAAATTTATTTTTTTAGGAATGTTATTTAGTAGTGTAAGATTTATTTTTTTGGGTATGTTGATTTATATTTATT + FF:FFFFFF:FF:FFFFFFF:FF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF,,,F,F,F,::FFFFFFFFFFFFFFFFFFFFFFFF::,,FFFFFFFFFFFFFFFFF @A00653:13:HJ3NGDSXX:3:1101:2691:1016 1:N:0:AGCGATAG+TAATCTTA ATAAAGTTTTAAGTTTTCTTTTTTTTTTTTTTTTTTTGTTATATTTGGTTTTTTGGTTTTTGAGTTAAAATAAATTGAAATTATTTATGTATGTTTTTTTTTTTTTATGATTGTTTTGAGGGTTTTTTTTATTTT

(py2) $ python /data/home/users/afenn/ BSSeeker/bs_seeker2-align.py --aligner=bowtie2 -p /data/home/users/afenn/anacond a3/envs/py2/bin/ -i "/data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L 003_pcrfiltered01.fastq" -g GCF_000001405.39_GRCh38.p13_genomic.fna -o "/data/ho me/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32L003.bam" -u "/data/home/user s/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003__unmapped.fastq" --temp_dir=/loc alscratch/afenn

 BS-Seeker2 v2.1.8 - Oct. 30, 2018

[2019-10-07 14:47:43] Mode: Bowtie2, local alignment [2019-10-07 14:47:43] Filter for tag XS: #(mCH)/#(all CH)>50.00% and #(mCH)>5 [2019-10-07 14:47:43] Temporary directory: /localscratch/afenn/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-ngk9ZN [2019-10-07 14:47:43] Reduced Representation Bisulfite Sequencing: False [2019-10-07 14:47:43] Single end [2019-10-07 14:47:43] Aligner command: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x %(reference_genome)s -f -U %(input_file)s -S %(output_file)s [2019-10-07 14:47:43] ---------------------------------------------- [2019-10-07 14:47:43] Read filename: /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq [2019-10-07 14:47:43] The first base (for mapping): 1 [2019-10-07 14:47:43] The last base (for mapping): 200 [2019-10-07 14:47:43] Path for short reads aligner: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x %(reference_genome)s -f -U %(input_file)s -S %(output_file)s

[2019-10-07 14:47:43] Reference genome library path: /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2 [2019-10-07 14:47:43] Directional library [2019-10-07 14:47:43] Number of mismatches allowed: 4 [2019-10-07 14:47:43] -------------------------------- [2019-10-07 14:47:45] Start reading and trimming the input sequences Traceback (most recent call last): File "/data/home/users/afenn/BSSeeker/bs_seeker2-align.py", line 412, in options.Output_unmapped_hit File "/data/home/users/afenn/BSSeeker/bs_align/bs_single_end.py", line 530, in bs_single_end seq = l[0] IndexError: list index out of range

guoweilong commented 4 years ago

Can you edit the code for "/data/home/users/afenn/ BSSeeker/bs_seeker2-align.py", by replacing all the line.split() into line.split("\t")?

And then rerun the code, to see if the error is fixed.

Weilong

amitfenn commented 4 years ago

Perhaps I understood you wrong, but I've tried to record the changes I've made to your script here:

(py2) $ grep --include=*.{py,py,pyc,sh,sh} -rnw ~/BSSeeker/ -e 'line.split()' /data/home/users/afenn/BSSeeker/bs_utils/utils.py:286: chrom_id = sanitize_seqid.sub('', line.split()[0][1:]) /data/home/users/afenn/BSSeeker/Antisense.py:110: tokens = line.split() /data/home/users/afenn/BSSeeker/Antisense.py:118: tokens = line.split() /data/home/users/afenn/BSSeeker/Antisense.py:199: tokens = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py:49: l = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py:266: l = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py:692: l = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_rrbs.py:203: l = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_rrbs.py:587: l = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_align_utils.py:261: buf = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_align_utils.py:275: buf = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_single_end.py:179: l = line.split() /data/home/users/afenn/BSSeeker/bs_align/bs_single_end.py:515: l = line.split()

After editing the code, as you asked

(py2) $ grep --include=*.{py,py,pyc,sh,sh} -rnw ~/BSSeeker/ -e 'line.split(' /data/home/users/afenn/BSSeeker/bs_utils/utils.py:286: chrom_id = sanitize_seqid.sub('', line.split()[0][1:]) /data/home/users/afenn/BSSeeker/Antisense.py:110: tokens = line.split() /data/home/users/afenn/BSSeeker/Antisense.py:118: tokens = line.split() /data/home/users/afenn/BSSeeker/Antisense.py:199: tokens = line.split()

/data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py:49: l = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py:266: l = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_pair_end.py:692: l = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_rrbs.py:203: l = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_rrbs.py:587: l = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_align_utils.py:217: buf = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_align_utils.py:261: buf = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_align_utils.py:275: buf = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_single_end.py:179: l = line.split("\t") /data/home/users/afenn/BSSeeker/bs_align/bs_single_end.py:515: l = line.split("\t")

Running the script again

(py2) $ python /data/home/users/afenn/BSSeeker/bs_seeker2-align.py --aligner=bowtie2 -p /data/home/users/afenn/anaconda3/envs/py2/bin/ -i /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq -g GCF_000001405.39_GRCh38.p13_genomic.fna -o /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32L003.bam

 BS-Seeker2 v2.1.8 - Oct. 30, 2018

[2019-10-08 13:06:27] Mode: Bowtie2, local alignment [2019-10-08 13:06:27] Filter for tag XS: #(mCH)/#(all CH)>50.00% and #(mCH)>5 [2019-10-08 13:06:27] Temporary directory: /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl [2019-10-08 13:06:27] Reduced Representation Bisulfite Sequencing: False [2019-10-08 13:06:27] Single end [2019-10-08 13:06:27] Aligner command: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x %(reference_genome)s -f -U %(input_file)s -S %(output_file)s [2019-10-08 13:06:27] ---------------------------------------------- [2019-10-08 13:06:27] Read filename: /data/home/users/afenn/NovaSeqMS/FastqOutput/NA/54/C1_S32_L003_pcrfiltered01.fastq [2019-10-08 13:06:27] The first base (for mapping): 1 [2019-10-08 13:06:27] The last base (for mapping): 200 [2019-10-08 13:06:27] Path for short reads aligner: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x %(reference_genome)s -f -U %(input_file)s -S %(output_file)s

[2019-10-08 13:06:27] Reference genome library path: /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2 [2019-10-08 13:06:27] Directional library [2019-10-08 13:06:27] Number of mismatches allowed: 4 [2019-10-08 13:06:27] -------------------------------- [2019-10-08 13:06:29] Start reading and trimming the input sequences [2019-10-08 13:06:40] Start mapping [2019-10-08 13:06:40] Starting commands: [2019-10-08 13:06:40] Launched: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2/W_C2T -f -U /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/Trimmed_C2T.fa.tmp-5332919 -S /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/W_C2T_m4.mapping.tmp-5332919 [2019-10-08 13:06:40] Launched: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2/C_C2T -f -U /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/Trimmed_C2T.fa.tmp-5332919 -S /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/C_C2T_m4.mapping.tmp-5332919 [2019-10-08 13:15:32] Finished: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2/W_C2T -f -U /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/Trimmed_C2T.fa.tmp-5332919 -S /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/W_C2T_m4.mapping.tmp-5332919 [2019-10-08 13:16:03] Finished: /data/home/users/afenn/anaconda3/envs/py2/bin/bowtie2 --local --quiet -p 2 -D 50 --norc --sam-nohead -k 2 -x /data/home/users/afenn/BSSeeker/bs_utils/reference_genomes/GCF_000001405.39_GRCh38.p13_genomic.fna_bowtie2/C_C2T -f -U /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/Trimmed_C2T.fa.tmp-5332919 -S /tmp/bs_seeker2_C1_S32L003.bam_-bowtie2-local-TMP-zMMPXl/C_C2T_m4.mapping.tmp-5332919 Traceback (most recent call last): File "/data/home/users/afenn/BSSeeker/bs_seeker2-align.py", line 412, in options.Output_unmapped_hit File "/data/home/users/afenn/BSSeeker/bs_align/bs_single_end.py", line 674, in bs_single_end original_BS = original_bs_reads[header] KeyError: 'A00653:13:HJ3NGDSXX:3:1101:10004:11428'

please tell me if I did as you intended, or how I may take this further This looks like progress to me, but now we have a new error.

guoweilong commented 4 years ago

Hi,

It indeed solved last problem, but creates new one. Actually, the BS-Seeker2 was developed under python2, while currently many users are moving to python3.

Maybe you can specify the python2 environment, and I have not seen similar error message when under my python2 environment.

I noticed you are usinng anaconda3, so it may work if you used anaconda2.

Best, Weilong

amitfenn commented 4 years ago

I don't think that's a solution. I was running on a python2 environment, it shouldn't make a difference if I'm on anaconda 3 or 2 at this stage.

Thank you so much, Weilong, for your support and for trying to fix BSseeker2 for my use case. ( I would, however, leave this thread unsolved (for now), for anyone else that has the same issue.)

SRenan commented 4 years ago

I have the same issue, with paired-end and single-end

python ../programs/BSseeker2/bs_seeker2-align.py -i ./data/G491-01_CORT17v4_Tcell_S21_R1_001.fastq  -g ${utilsdir}/human.g1k.v37.fa --aligner=bowtie --rrbs -o ${outdir}/mate1.bam


     BS-Seeker2 v2.1.8 - Oct. 30, 2018

[2019-11-07 14:39:59] Mode: Bowtie
[2019-11-07 14:39:59] Filter for tag XS: #(mCH)/#(all CH)>50.00% and #(mCH)>5
[2019-11-07 14:39:59] Temporary directory: /tmp/bs_seeker2_mate1.bam_-bowtie-TMP-_Qwh9y
[2019-11-07 14:39:59] Reduced Representation Bisulfite Sequencing: True
[2019-11-07 14:39:59] Single end 
[2019-11-07 14:39:59] Aligner command: /usr/bin/bowtie -e 160 --quiet --norc --sam-nohead --best -p 2 --nomaqround --sam  -k 2 %(reference_genome)s  -f %(input_file)s %(output_file)s
[2019-11-07 14:39:59] ----------------------------------------------
[2019-11-07 14:39:59] Read filename: ./data/G491-01_CORT17v4_Tcell_S21_R1_001.fastq
[2019-11-07 14:39:59] The first base (for mapping): 1
[2019-11-07 14:39:59] The  last base (for mapping): 200 
[2019-11-07 14:39:59] Path for short reads aligner: /usr/bin/bowtie -e 160 --quiet --norc --sam-nohead --best -p 2 --nomaqround --sam  -k 2 %(reference_genome)s  -f %(input_file)s %(output_file)s

[2019-11-07 14:39:59] Reference genome library path: /gpfs/group/dxl46/default/private/renan/programs/BSseeker2/bs_utils/reference_genomes/human.g1k.v37.fa_rrbs_20_500_bowtie
[2019-11-07 14:39:59] Directional library
[2019-11-07 14:39:59] Number of mismatches allowed: 4
[2019-11-07 14:39:59] -------------------------------- 
[2019-11-07 14:39:59] Start reading and trimming the input sequences
[2019-11-07 14:40:00] Processing read file: /tmp/bs_seeker2_mate1.bam_-bowtie-TMP-_Qwh9y/G491-01_CORT17v4_Tcell_S21_R1_001.fastq-s-1

Traceback (most recent call last):
  File "../programs/BSseeker2/bs_seeker2-align.py", line 393, in <module>
    options.cut_format
  File "/gpfs/group/dxl46/default/private/renan/programs/BSseeker2/bs_align/bs_rrbs.py", line 218, in bs_rrbs
    seq = l[0]
IndexError: list index out of range

Using python 2.7.14, pysam 0.15.3 and conda 4.3.30. The index building and Antisense script are successful.

The error also occurs with --aligner=bowtie2

BSSeeker / BSseeker2

Aligning pair end reads: IndexError: List index out of range. #31