asafpr / RILseq

RILseq computational protocol
MIT License
9 stars 10 forks source link

Fail to run RIL-seq #11

Closed peng7418520 closed 6 years ago

peng7418520 commented 6 years ago

Dear author,

I am trying to use your package RIL-seq for my bacterial chimeric read analysis. I believe I installed correctly and ran correctly but failed in the very last step somewhere.

I load my modules SAMTools/1.2, bwa 0.7.12-r1044, PySAM/0.6, PySAM/0.6, SciPy/0.9.0, Biopython/1.65 and run the first step as "RILseq-0.57/bin/map_single_fragments.py genome.fna -g annotation.gtf -1 trimadapt_seq3PE2_S3_1P.fastq.gz -2 trimadapt_seq3PE2_S3_2P.fastq.gz -d test1 -o test2 -m 1"

The program starts to run fine for like a minute until the end that I got this error: [bam_sort_core] merging from 3 files... [E::hts_open] fail to open file 'test1/trimadapt_seq3PE2_S3_1P.fastq_bwa.bam' Traceback (most recent call last): File "RILseq-0.57/bin/map_single_fragments.py", line 153, in status = main() File "RILseq-0.57/bin/map_single_fragments.py", line 124, in main samfile = pysam.Samfile(bamname) File "csamtools.pyx", line 463, in csamtools.Samfile.cinit (pysam/csamtools.c:4758) File "csamtools.pyx", line 496, in csamtools.Samfile._open (pysam/csamtools.c:5234) File "csamtools.pyx", line 593, in csamtools.Samfile._open (pysam/csamtools.c:6212) IOError: file test1/trimadapt_seq3PE2_S3_1P.fastq_bwa.bam not found

I only know very basic things about python. I tried to read your "map_single_fragments.py" script and I believe the error came from line 119 to 124 about the "bamname".

I did some searches and it appeared to me that you used a python package called "tempfile" to store the temporary file which has several versions from python2 to python3. Maybe the python version I am using is wrong and this caused the problem. Could you please let me know what do you think and what exact python version you used to develop the package?

Really appreciate your kind help!!!

asafpr commented 6 years ago

I suspect that bwa didn't even run hence you didn't get the output bam file. can you just run "bwa"and see if there is any output? If it fails, please make sure bwa is in a $PATH directory or supply the direct link (e.g. /xxx/yyy/bwa) using the --bwa_exec

On Sun, Mar 4, 2018 at 11:28 PM, peng7418520 notifications@github.com wrote:

Dear author,

I am trying to use your package RIL-seq for my bacterial chimeric read analysis. I believe I installed correctly and ran correctly but failed in the very last step somewhere.

I load my modules SAMTools/1.2, bwa 0.7.12-r1044, PySAM/0.6, PySAM/0.6, SciPy/0.9.0, Biopython/1.65 and run the first step as "RILseq-0.57/bin/map_single_fragments.py genome.fna -g annotation.gtf -1 trimadapt_seq3PE2_S3_1P.fastq.gz -2 trimadapt_seq3PE2_S3_2P.fastq.gz -d test1 -o test2 -m 1"

The program starts to run fine for like a minute until the end that I got this error: [bam_sort_core] merging from 3 files... [E::hts_open] fail to open file 'test1/trimadapt_seq3PE2S3 1P.fastq_bwa.bam' Traceback (most recent call last): File "RILseq-0.57/bin/map_single_fragments.py", line 153, in status = main() File "RILseq-0.57/bin/map_single_fragments.py", line 124, in main samfile = pysam.Samfile(bamname) File "csamtools.pyx", line 463, in csamtools.Samfile.cinit (pysam/csamtools.c:4758) File "csamtools.pyx", line 496, in csamtools.Samfile._open (pysam/csamtools.c:5234) File "csamtools.pyx", line 593, in csamtools.Samfile._open (pysam/csamtools.c:6212) IOError: file test1/trimadapt_seq3PE2_S3_1P.fastq_bwa.bam not found

I only know very basic things about python. I tried to read your "map_single_fragments.py" script and I believe the error came from line 119 to 124 about the "bamname".

I did some searches and it appeared to me that you used a python package called "tempfile" to store the temporary file which has several versions from python2 to python3. Maybe the python version I am using is wrong and this caused the problem. Could you please let me know what do you think and what exact python version you used to develop the package?

Really appreciate your kind help!!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/asafpr/RILseq/issues/11, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxQURrfTkCglPxUUwyeIvKaCKLrbJr4ks5tbFxrgaJpZM4Sbgo- .

peng7418520 commented 6 years ago

Thanks for the quick response. I believe bwa was working fine. I send you the complete error message file as attached. As you can see, bwa was working. But for some reason I was not getting the output file.

Also, could you please provide the version you used for pysam, numpy, scipy and biopython? I could not find them in the README file.

error_message.txt

Thanks for the help! I am very excited about your package!!!

asafpr commented 6 years ago

I see, I think the problem is that -d test1 is not an absolute path, try replacing it with the full path and try again. I will fix it in the code (convert to absolute path) later.

On Mon, Mar 5, 2018 at 4:26 PM, peng7418520 notifications@github.com wrote:

Thanks for the quick response. I believe bwa was working fine. I send you the complete error message file as attached. As you can see, bwa was working. But for some reason I was not getting the output file.

Also, could you please provide the version you used for pysam, numpy, scipy and biopython? I could not find them in the README file.

error_message.txt https://github.com/asafpr/RILseq/files/1781150/error_message.txt

Thanks for the help! I am very excited about your package!!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/asafpr/RILseq/issues/11#issuecomment-370435208, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxQUTosftgyyXeGqhOGBvyDaQwJ1xjcks5tbUsOgaJpZM4Sbgo- .

peng7418520 commented 6 years ago

Hey, sorry for the late response. I have been working on it and finally got RIL-seq worked! Thanks for the help! It turned out loading the right versions of the dependent modules is the key. Also, it seems by default it is always running under your path which needs to be changed to work properly.

I have another question. I am not sure if I should open other issue or I can ask here. I am not totally clear about the output. The output column include "Start of RNA1 first read“ and ”Start of RNA1 last readRNA1“. If they mean what they literally say, then how can I know the start site and end site of RNA1? Because some of my output lines have exactly the same values for these two column which makes me confused.

Thanks for the help!

asafpr commented 6 years ago

I'm glad it works now, I'll change the code accordingly. As to your second question, we don't know where the reads end since we only use the first 25 nucleotides for mapping. We look at the first nucleotide of each read and see where it maps to, we then accumulate all reads mapped to the same region (in 100 bp non-overlapping windows) and report for each interaction where the first nt of reads span along this region. If the value in both columns is the same it means that all reads mapped to this region in this interaction are mapped to the same position.

On Sun, Mar 18, 2018 at 3:23 AM, peng7418520 notifications@github.com wrote:

Hey, sorry for the late response. I have been working on it and finally got RIL-seq worked! Thanks for the help! It turned out loading the right versions of the dependent modules is the key. Also, it seems by default it is always running under your path which needs to be changed to work properly.

I have another question. I am not sure if I should open other issue or I can ask here. I am not totally clear about the output. The output column include "Start of RNA1 first read“ and ”Start of RNA1 last readRNA1“. If they mean what they literally say, then how can I know the start site and end site of RNA1? Because some of my output lines have exactly the same values for these two column which makes me confused.

Thanks for the help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/asafpr/RILseq/issues/11#issuecomment-373965464, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxQUQpRF_Dh6HpYxkD4ObYviSHNE45pks5tfbb7gaJpZM4Sbgo- .

peng7418520 commented 6 years ago

Hey, thanks for the explanation. I am thinking it would be better, thought might be difficult, to have the full length interaction region of RNA1 and RNA2. For example, 1 to 70 bp of the read is mapped to region one and 71 to 150 bp is mapped to region two. This gives more information about where exactly the sRNAs bind to in the target RNAs.

For now, I think I can use the "Start of RNA1 first read“ info to search against the chimeric reads information obtained from map_chimeric_fragments.py and find the corresponding read name. Then I can go to the raw data to find that read and do a blast search to find the mapped two region and where they split exactly. Thanks!

asafpr commented 6 years ago

I'll think about adding that. If you'll generate bed files from the chimeric reads with the dedicated script it will give you the ranges of each read. This is of course limited to the region being sequenced (using paired-end) and subject to RNase digestion so the interpretation is not straightforward.

On Sun, Mar 18, 2018 at 11:00 PM, peng7418520 notifications@github.com wrote:

Hey, thanks for the explanation. I am thinking it would be better, thought might be difficult, to have the full length interaction region of RNA1 and RNA2. For example, 1 to 70 bp of the read is mapped to region one and 71 to 150 bp is mapped to region two. This gives more information about where exactly the sRNAs bind to in the target RNAs.

For now, I think I can use the "Start of RNA1 first read“ info to search against the chimeric reads information obtained from map_chimeric_fragments.py and find the corresponding read name. Then I can go to the raw data to find that read and do a blast search to find the mapped two region and where they split exactly. Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/asafpr/RILseq/issues/11#issuecomment-374046649, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxQUQAhs2OG0sGKZ_vmkASFF5vPxcaRks5tfsrWgaJpZM4Sbgo- .

peng7418520 commented 6 years ago

Thanks Asafpr! I should be able to figure out how to do this.

My research purpose is a little bit different. I don't do Hfq and RNase digestion stuff. But my sequencing result purposely generate a lot of chimeric reads. That's why I want to know the exact boundaries of the chimeras which will be combined with my normal RNA-seq result for annotation purpose. And I find RIL-seq pipelines fit my purpose well! I especially like the Fisher' exact part because my chimeric reads include a lot of background noises such as chimeric reads generated from library-prep and sequencing, and ribosomal RNAs. So, it is nice that RIL-seq can be used for other works!

Thanks again and I am closing this issue.