Open dylanHco opened 2 weeks ago
Please send me the command, direct input file (this should be a list of fasta files), and the log. This will help me diagnose the problem. Thanks.
Here is the command: /projects/p31913/ASTER/bin/waster-site -i in3 -u 1 -t 4 -k 8 -o guidetest1.tre 2>a2.log
Log out: Without-Alignment/Assembly Species Tree EstimatoR † (site) Version: v1.16.1.0 Make sure you have run 'waster-site -h', read about '-k' command, and ensured you have enough memory to proceed! Quality control: Masking all SNP bases with quality lower than '?' for FASTQ inputs. Quality control: Masking all non-SNP bases with quality lower than '5' for FASTQ inputs. Species A_longiflora_ENG_S44 is selected to count the most frequent patterns. Hash table 0% filled. Species /projects/p31913/Trim_outs/A_palmeri_1_S48/*.fq is selected to count the most frequent patterns. File A_palmeri_5_S43 bad format!
Currently waster does not support *.fq, if you have multiple files for the same sample, please cat them into one file.
I have tried that too - and I get the same error.
Can I see the input and log file?
Without-Alignment/Assembly Species Tree EstimatoR † (site) Version: v1.16.1.0 Make sure you have run 'waster-site -h', read about '-k' command, and ensured you have enough memory to proceed! Quality control: Masking all SNP bases with quality lower than '?' for FASTQ inputs. Quality control: Masking all non-SNP bases with quality lower than '5' for FASTQ inputs. Species /projects/p31913/Trim_outs/A_tabernaemontana_repens_S31/A_tabernaemontana_repens_S31.merged.fasta is selected to count the most frequent patterns. File A_tharpii_P33_S26 bad format!
I see. This is maybe counter-intuitive, but in your input file try the following format instead:
/projects/p31913/Trim_outs/A_arenaria_TX7_S43/A_arenaria_TX7_S43.merged.fasta A_arenaria_TX7_S43 /projects/p31913/Trim_outs/A_ciliata_texanaH17_S47/A_ciliata_texanaH17_S47.merged.fasta A_ciliata_texanaH17_S47 ......
I still get the same error.
Without-Alignment/Assembly Species Tree EstimatoR † (site) Version: v1.16.1.0 Make sure you have run 'waster-site -h', read about '-k' command, and ensured you have enough memory to proceed! Quality control: Masking all SNP bases with quality lower than '?' for FASTQ inputs. Quality control: Masking all non-SNP bases with quality lower than '5' for FASTQ inputs. Species A_rigida_H10_S46 is selected to count the most frequent patterns. Hash table 0% filled. Species A_tomentosa_tomentosa_4_S29 is selected to count the most frequent patterns. Hash table 0% filled. Species /projects/p31913/Trim_outs/A_grandiflora_1_S32/A_grandiflora_1_S32.merged.fasta is selected to count the most frequent patterns. File A_grandiflora_1_S32 bad format!
inputfileA.txt Try this input.
Without-Alignment/Assembly Species Tree EstimatoR † (site) Version: v1.16.1.0 Make sure you have run 'waster-site -h', read about '-k' command, and ensured you have enough memory to proceed! Quality control: Masking all SNP bases with quality lower than '?' for FASTQ inputs. Quality control: Masking all non-SNP bases with quality lower than '5' for FASTQ inputs. Species /projects/p31913/Trim_outs/A_tabernaemontana_repens_S31/A_tabernaemontana_repens_S31.merged.fasta is selected to count the most frequent patterns. File A_tabernaemontanaB_repens_S31 bad format!
inputfileA.txt Weird. What about this one?
Without-Alignment/Assembly Species Tree EstimatoR † (site) Version: v1.16.1.0 Make sure you have run 'waster-site -h', read about '-k' command, and ensured you have enough memory to proceed! Quality control: Masking all SNP bases with quality lower than '?' for FASTQ inputs. Quality control: Masking all non-SNP bases with quality lower than '5' for FASTQ inputs. Species A_tabernaemontanaB_repens_S31 is selected to count the most frequent patterns. Hash table 0% filled. Species A_tabernaemontanaA_H9_S39 is selected to count the most frequent patterns. Hash table 0% filled. Species A_rigida_H10_S46 is selected to count the most frequent patterns. Hash table 0% filled. Species A_ciliata_texanaH17_S47 is selected to count the most frequent patterns. Hash table 0% filled. Species A_hubrichtii_2_S30 is selected to count the most frequent patterns. Hash table 0% filled. Species A_tomentosa_tomentosa_4_S29 is selected to count the most frequent patterns. Hash table 0% filled. Species A_mystery_2_S42 is selected to count the most frequent patterns. Hash table 0% filled. Species A_palmeriB_S48 is selected to count the most frequent patterns. Hash table 0% filled. Species A_tharpii_P33_S26 is selected to count the most frequent patterns. Hash table 0% filled. Species Rhazya_stricta_S10 is selected to count the most frequent patterns. Hash table 0% filled. Species A_longiflora_OV_S11 is selected to count the most frequent patterns. Hash table 0% filled. Species A_ciliata_texana_S6 is selected to count the most frequent patterns. Hash table 0% filled. Species A_grandiflora_2_S36 is selected to count the most frequent patterns. Hash table 0% filled. Species A_fugatei_F25_S32 is selected to count the most frequent patterns. Hash table 0% filled. Species A_rigida_3_S15 is selected to count the most frequent patterns. Hash table 0% filled. Species A_longiflora_ENG_S44 is selected to count the most frequent patterns. Hash table 0% filled. Species A_arenaria_TX7_S43 is selected to count the most frequent patterns. Hash table 0% filled. Species A_fugatei_2_S11 is selected to count the most frequent patterns. Hash table 0% filled. Species A_kearnyana_2CKE_S18 is selected to count the most frequent patterns. Hash table 0% filled. Species A_palmeriA_S4 is selected to count the most frequent patterns. File /projects/p31913/Trim_outs/A_palmeri_1_S4/A_palmeri_1_S4.merged.fasta bad format!
Some progress. Please send me /projects/p31913/Trim_outs/A_palmeri_1_S4/A_palmeri_1_S4.merged.fasta if that is small enough. If it is very large, send me the first 10,000 lines.
Ok - so the previous file you sent me works. I was missing the merged.fasta in that folder. The program started to work after that. No other merged fastas were missing and so however you formatted it got it to work.
Hello - I am trying to use Waster to generate a tree for input into cactus aligner. I am trying to align chloroplast genomes from several different species. Instead of using raw fastq files, I am first taking them thru GetOrganelle (https://github.com/Kinggerm/GetOrganelle) to extract just reads associated with chloroplast in order to build a tree. GetOrganelle outputs paired in reads as fastqs that are only associated with the chloroplast. I use BBMerge to merge together the paired end files. However when I try to use these with waster, I get Bad Format errors and I am not sure why that is. Below is the first few lines of a merged fasta:
Thanks for any suggestions! Dylan