WGLab / LinkedSV

MIT License
21 stars 8 forks source link

Failing to remove sparse nodes #26

Closed milesandersonmn closed 2 years ago

milesandersonmn commented 2 years ago

I'm running an analysis on tomato using 10x reads, and my linkedSV pipeline is failing to run the remove_sparse_nodes command.

Here is a sample from the end of the output log:

[12/26/2021 12:56:31 (227.287 MB)] N95_fragment_length is: 5857 [12/26/2021 12:57:34 (187.400 MB)] finished getting fragment parameters [12/26/2021 12:57:34 (186.352 MB)] searching for paired breakpoints [12/26/2021 12:57:34 (186.352 MB)] searching paired breakpoints [12/26/2021 12:57:34 (186.352 MB)] building nodes from fragments [12/26/2021 12:57:34 (186.352 MB)] reading bcd22 file:/data/proj/chilense/30_genomes_outputs/Miles/phased_possorted_bam.bam.bcd22 [12/26/2021 12:58:00 (755.536 MB)] total number of fragments: 1294628 [12/26/2021 12:58:01 (755.536 MB)] writing to node file [12/26/2021 12:58:38 (187.208 MB)] removing sparse nodes, min_support_fragments is 10 [12/26/2021 12:58:38 (187.208 MB)] Running CMD: /data/proj/chilense/30_genomes_outputs/Miles/LinkedSV/scripts/../bin/remove_sparse_nodes /data/proj/chilense/30_genomes_outputs/Miles/phased_possorted_bam.bam.node33 /data/proj/chilense/30_genomes_outputs/Miles/phased_possorted_bam.bam.node33.candidates 5000 /data/proj/chilense/30_genomes_outputs/reference/S_chilense_new/S_chilense_reference_rename.fasta.fai 10 [12/26/2021 13:46:16 (3.293 MB)] ERROR: Failed to run command: /data/proj/chilense/30_genomes_outputs/Miles/LinkedSV/scripts/../bin/remove_sparse_nodes /data/proj/chilense/30_genomes_outputs/Miles/phased_possorted_bam.bam.node33 /data/proj/chilense/30_genomes_outputs/Miles/phased_possorted_bam.bam.node33.candidates 5000 /data/proj/chilense/30_genomes_outputs/reference/S_chilense_new/S_chilense_reference_rename.fasta.fai 10 [12/26/2021 13:46:16 (4.215 MB)] Return value is: 9

fangli80 commented 2 years ago

Hello, Could you please describe your input data type (WES or WGS) and command? Thanks, Li

milesandersonmn commented 2 years ago

It is a WGS.

/data/proj/chilense/30_genomes_outputs/Miles/LinkedSV/linkedsv.py -i /data/proj/chilense/30_genomes_outputs/Miles/10xLinks/SI-GA-D6/outs/phased_possorted_bam.bam -r /data/proj/chilense/30_genomes_outputs/reference/S_chilense_new/S_chilense_reference_rename.fasta -d /data/proj/chilense/30_genomes_outputs/Miles -t 40 --somatic_mode --gap_region empty.bed --black_region_bed empty.bed

fangli80 commented 2 years ago

Could you please send the following 3 files to fangli2718@gmail.com so that I can have a test?

/data/proj/chilense/30_genomes_outputs/Miles/phased_possorted_bam.bam.node33 
/data/proj/chilense/30_genomes_outputs/Miles/phased_possorted_bam.bam.node33.candidates
/data/proj/chilense/30_genomes_outputs/reference/S_chilense_new/S_chilense_reference_rename.fasta.fai

Thanks.

milesandersonmn commented 2 years ago

Ok I thought it might be a disk quota issue, but I ran the process again with more disk space and it still didn't work.

The process doesn't create a node.33.candidates file.

Running the process using the cluster local scratch, didn't produce any errors, but it also didn't produce any of the output the final output files. I only have the following files produced from the command:

phased_possorted_bam.bam.arguments phased_possorted_bam.bam.barcode_cov.bed phased_possorted_bam.bam.barcode_statistics phased_possorted_bam.bam.bcd21.gz phased_possorted_bam.bam.bcd22 phased_possorted_bam.bam.bcd22.tmp phased_possorted_bam.bam.fragment_statistics phased_possorted_bam.bam.high_cov.bed phased_possorted_bam.bam.low_mapq.bcd21.gz phased_possorted_bam.bam.node33 phased_possorted_bam.bam.node35 phased_possorted_bam.bam.node53 phased_possorted_bam.bam.node55 phased_possorted_bam.bam.weird_reads.txt

distilledchild commented 1 year ago

@fangli80 @milesandersonmn Did you solve this problem? I got the exactly same one,

ERROR: Failed to run command: /tools/LinkedSV/scripts/../bin/remove_sparse_nodes /projects/hic/2022_F_CTC/sv_detect/linkedsv/output/SHR/SHR_phased_possorted_bam.bam.node33 /projects/hic/2022_F_CTC/sv_detect/linkedsv/output/SHR/SHR_phased_possorted_bam.bam.node33.candidates 5789 /refs/rn7_ucsc/rn7chr.fa.fai 10

fangli80 commented 1 year ago

@theshowmustgolangon Sorry for the inconvenience. I can not replicate this problem with my data. Do you mind if you share your dataset with me so that I can test on it ?

Best, Li

distilledchild commented 1 year ago

@fangli80 I cannot find a node.33.candidates file. The process seem to not create a node.33.candidates file. I shared my google drive with you.

fangli80 commented 1 year ago

@theshowmustgolangon It seems that there is only a .fai file in the shared folder. Is it possible that you share the SHR_phased_possorted_bam.bam file?

Thanks, Li

distilledchild commented 1 year ago

@fangli80 Hi, I added a wrong file.After I upload the file, I will get back to you. Thank you for your support!

distilledchild commented 1 year ago

@fangli80 Hi, it is uploaded my school Onedrive, I shared it with you! Could you check your gmail please? Also, I am uploading one more bam file of another sample's. They both caused the same error. Could you check them please??

Thanks, Pete

milesandersonmn commented 1 year ago

Hi, Pete

It's been long while since I worked with the linked read data. But if I recall correctly the problem was I was using the draft reference instead of the reference files created using the Longranger pipeline. I can't tell from your command but the reference genome should be output by Longranger into a directory path such as "~/refdata-myGenome/fasta/genome.fa"

This genome.fa file should be used as your reference argument. Again I'm not 100% positive it's been quite some time. But that is the solution that comes to mind when I try to remember.

Good luck, Miles

distilledchild commented 1 year ago

@milesandersonmn Hi Miles, I am really thankful for your comment, and I checked my command based on your advice. I think I used a reference genome correctly as you mentioned. I will let you know what caused my errors if @fangli80 find!

Thank you!

fangli80 commented 1 year ago

I am downloading the bam file. I will let you know the updates after I test on it.

distilledchild commented 1 year ago

@milesandersonmn @fangli80 I ran linkedsv successfully. The problem was samtools access to shared library related to gcc, libstdc++.so.6 in /usr/lib64 directory. Bedtools and samtools installed by conda were used, but the errors were fixed after I used samtools and bedtools loaded from HPC server I am using. Thank you for your support, and I wish this would be helpful for anyone who get this error.

@fangli80 I uploaded another phased_possorted_bam.bam file and shared it with you. I hope I can get any advice on generating blacklist file.