Open nicolashazzi23 opened 1 year ago
You will likely need to check the log files output by samtools/bamtools. The image attached is somewhat hard to see, but it does not look like some files are being created properly. Once you have fasta files, they can both go in the same folder. Extracting SNPs is up to you - e.g. depending on your needs, you can choose where to harvest the SNP calls from... or you can choose to additionally alter/update the files produced to output SNPs.
Hi Brant thank you very much for your help. This is the tail of the last 50 lines of the run that shows the error. It seems that is a memory capacity error but I did the run with the maximum capacity of our slurm cluster: "highMem –Nodes in this category have large memory – 3tb and are for jobs that require more memory intensive jobs", and still I got this error
thanks!
Finished processing comp53472_c0_seq1:1-252
Processing comp53486_c0_seq1:1-333
bam bams/HW_0458.0.bam: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:174)
at htsjdk.samtools.SAMTextHeaderCodec.advanceLine(SAMTextHeaderCodec.java:139)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:94)
at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:667)
at htsjdk.samtools.BAMFileReader.
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message
Looks like you need to feed pilon some more RAM (running it on a large node is usually not quite enough). This should only require modifying the workflow script in the pilon section here to read like:
'pilon -jar -Xmx256G --threads {threads}...
Where you'll change the 256G
to something that works for your HPC. This sets the max RAM pilon can use (by default it is 1 G).
Hi Brant, thank you very much! it work thanks to your suggestion!
Excellent 👍
Hi Brant sorry for bothering you again but I would like to kindly ask you again about the fasta 0 and 1 files generated by the phasing process. I want to estimate species trees and also get SNPs for a Structure analysis. When I put the 0.fasta and the 1.fasta files in the same folder as you suggested, and ran the function phyluce_assembly_match_contigs_to_probes function, and I got the following error "sqlite3.OperationalError: duplicate column name: HW_0302". Therefore, I should merge the 0.fasta and 1.fasta files using the cat function? or what should I do after the phasing with the 0.fasta and 1.fasta files? thanks in advance!
Hello Brant. I am also running into problems with the phasing workflow. It keeps failing here:
_[Thu Nov 7 09:48:18 2024] rule pilon_allele_1: input: /home/kwgray/Vanuatu/allVanuatu/contigs/Strumigenys_rogeri_VAN342.contigs.fasta, bams/Strumigenys_rogeri_VAN342.1.bam, bams/Strumigenys_rogeri_VAN342.1.bam.bai output: fastas/Strumigenys_rogeri_VAN342.1.fasta jobid: 441 wildcards: sample=Strumigenys_rogeri_VAN342
[Thu Nov 7 09:48:23 2024] Error in rule pilon_allele_1: jobid: 441 output: fastas/Strumigenys_rogeri_VAN342.1.fasta shell: pilon --threads 1 --vcf --changes --fix snps,indels --minqual 10 --mindepth 5 --genome /home/kwgray/Vanuatu/allVanuatu/contigs/Strumigenys_rogeri_VAN342.contigs.fasta --bam bams/Strumigenys_rogeri_VAN342.1.bam --outdir fastas --output Strumigenys_rogeriVAN342.1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Both the of BAM files are present (bams/Strumigenys_rogeri_VAN342.1.bam, bams/Strumigenys_rogeri_VAN342.1.bam.bai) from the phasing workflow. However, the fastas/Strumigenys_rogeri_VAN342.1.fasta is not present when the workflow fails at this step. I am not sure if it is a BAM index file issue as you previously suggested in a different issue thread (#302 ) or maybe a RAM shortage. I already increased the RAM allocation to 100G but it still failed at this step (as suggested earlier in this thread). My dataset includes 60 samples so I would assume 100G RAM is enough, but maybe this is the problem. Do you have any other suggestions?
Thank you much in advance Kyle
Did things run ok for your other files (sounds sort of like they did)...? If so, it's likely to be a RAM issue - Pilon uses a TON of RAM in certain cases.
Yes all of the files were generated before the fasta file step. Ill try doing the analysis on a server with more RAM to see if that does the trick. Thank you!
Apologies / I wasn't super clear - do you have other individuals in the run that worked (e.g. produced output from Pilon)?
No other output was produced from pilon. It keeps failing at the first individual (error message above).
Perhaps try an individual with less data to see if that one works - the error may be with Pilon (and not due to RAM).
Pilon is now successfully running. But my phasing.conf now includes only one sample as a test and its a different sample from before. You are correct in that it uses A LOT of RAM...pilon for one sample is using 36G RAM. What I changed was the Snakemate file as you previously suggested to include -Xmx100G (our server has max 120G RAM). I think by adding '-jar' to the pilon function as you previously suggested causes problems based on my error report so I excluded '-jar' and now its running:
Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500 Unknown option -jar [Fri Nov 8 09:40:12 2024] Error in rule pilon_allele_0: jobid: 7 output: fastas/Eurhopalothrix_procera_VAN104.0.fasta shell: pilon -jar -Xmx100G --threads 1 --vcf --changes --fix snps,indels --minqual 10 --mindepth 5 --genome /home/kwgray/Vanuatu/allVanuatu/contigs/Eurhopalothrix_procera_VAN104.contigs.fasta --bam bams/Eurhopalothrix_procera_VAN104.0.bam --outdir fastas --output Eurhopalothrix_procera_VAN104.0 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Overall, its appears to be a RAM limitation as you suggested.
Hi, I am trying to run the phasing workflow with the bams files generated previously with the mapping workflow. However I am not getting the fasta files in the results, I just got the bam files (see attached image). I am attaching the conf file if that can give some help in clarifying if I am doing something wrong. My final question is regarding how to construct the final snp matrix. Because the tutorial say this after phasing "You can essentially group all the .0.fasta and .1.fasta files for all taxa together as new “assemblies” of data and start the phyluce analysis process over from phyluce_assembly_match_contigs_to_probes.". But I find this kind of ambiguous, I should create a contig folder with the 0.fasta files and a second folder with the 1.fasta files? or a folder with both files together? and How at the end I can put out the snps?
this is the command that I ran
phyluce_workflow --config bams_2.conf \ --output phasing_3 \ --workflow phasing \ --cores 1 phasing.txt