WeiWei060512 / NUMTs-detection

Detecting NUMTs from WGS
11 stars 2 forks source link

Error with file name when running searchBreakpoint_fromblatoutputs.py #3

Open ehsansari1234 opened 1 year ago

ehsansari1234 commented 1 year ago

Hi.

Thanks for sharing very interesting scripts. I have two questions which will help me to resolve errors I am getting with the script:

1) In NUMTs_detection.sh, I have difficult time to understand ${OUTPUT_wgs} variable. This is not define in the script, so when I run the script on my bam file, the split file won't be generated. When I change it to ${OUTPUT} I get 159a1srt.mt.split.sam in the output folder as expected. 2) The pipeline then progress until I reach to searchBreakpoint_fromblatoutputs.py script. I get following errors { look for cluster's done, move to look for breakpoints 159a1srt [E::hts_open_format] Failed to open file "159a1srt" : No such file or directory samtools view: failed to open "159a1srt" for reading: No such file or directory End of file reading 4 bytes Traceback (most recent call last):}

I am not sure the source of issue. Could you please help out? I am concerned, there must be some reservation for naming files etc. Note that I am using non-human genome and here are my chromsome naming { chr1, chr2, chr3, chr4, chr_mt} To accommodate difference between the chromosome names, I changed the grep part of NUMTs_detection.sh scripts as follow { grep -e @ -e MT -e chr_mt }

please let me know if you need any other information.

Best Ehsan

WeiWei060512 commented 1 year ago

Hi Ehsan, Thanks for reaching out. Have you managed to run through "searchNumtCluster_fromDiscordantReads.py"?

Wei

ehsansari1234 commented 1 year ago

Hi Wei,

Thanks for the quick reply.

I get some warning when the scripts reaches to searchNumtCluster_fromDiscordantReads.py but I believe it worked as 159a1srt.mt.disc.sam.breakpointINPUT.tsv is generated in the output folder. Here are a few lines of breakpointINPUT.tsv file { 159a1srt 2 ./output/159a1srt.mt.disc.sam ./output/159a1srt.mt.split.sam ./159a1srt.bam chr1 1005205 1006728 159a1srt 2 ./output/159a1srt.mt.disc.sam ./output/159a1srt.mt.split.sam ./159a1srt.bam chr1 10057554 10059006 159a1srt 2 ./output/159a1srt.mt.disc.sam ./output/159a1srt.mt.split.sam ./159a1srt.bam chr1 10059036 10060651 159a1srt 2 ./output/159a1srt.mt.disc.sam ./output/159a1srt.mt.split.sam ./159a1srt.bam chr1 1015885 1017220 159a1srt 2 ./output/159a1srt.mt.disc.sam ./output/159a1srt.mt.split.sam ./159a1srt.bam chr1 10174241 10175740 159a1srt 2 ./output/159a1srt.mt.disc.sam ./output/159a1srt.mt.split.sam ./159a1srt.bam chr1 10186464 10187796 159a1srt 2 ./output/159a1srt.mt.disc.sam ./output/159a1srt.mt.split.sam ./159a1srt.bam chr1 10244730 10246102 }

Please let me know should you need further info.

Thanks Ehsan

WeiWei060512 commented 1 year ago

Hi Ehsan, It looks samtools can't find input. You may need to double check your input files.

Wei

ehsansari1234 commented 1 year ago

Hi Wei,

Thanks. Your clue helped to me find the issue. The breakpointINPUT.tsv file is a tab limited but a comma separated input in expected for breakpint detection part. No I am off the error loop but I get numerous complains from Pandas. May I know the version of Pandas and Python on your system: here is a returned error message{ File "/globalhome/ehs220/HPC/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 3426, in _ensure_valid_index raise ValueError('Cannot set a frame with no defined index ' ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series}

Best Ehsan

jingydz commented 5 months ago

Hi, I also encountered the error same as you. but the python version I used was python3.6. Did you successfully resolve this issue in the end?