AntonelliLab / seqcap_processor

Bioinformatic pipeline for processing Sequence Capture data for Phylogenetics
MIT License
21 stars 12 forks source link

--disable_stats missing? #20

Open BenKuhnhaeuser opened 3 years ago

BenKuhnhaeuser commented 3 years ago

Hi Tobias,

I am using version 2.0.2, and am looking in vain for the --disable_stats option described in the documentation. This would be crucial as I already used Trimmomatic with the Maxinfo algorithm but now would want to continue with your pipeline for contig assembly. Any ideas how to get this to work?

Many thanks in advance, Ben

tandermann commented 3 years ago

Hi Ben, I have to update the documentation, the --disable_stats flag does not exist anymore in SECAPR 2.0 and newer. The function will automatically figure out whether or not a stats file exists, so you should be able to just run the secapr assemble_reads function without problems (assuming that your read files are arranged in the required input folder structure). Let me know if you run into problems, I'm trying to make SECAPR more flexible in terms of entering at different parts of the workflow, so any feedback is appreciated! Best, Tobias

tandermann commented 3 years ago

for an example of the required input folder structure, you can check out this example: https://drive.google.com/file/d/1IUVqNh-EZKTmNYXWnO60UEs81bhnsv_t/view?usp=sharing the important things are having a separate subfolder for each sample, names sampleID_clean (change sampleID accordingly) and having your cleaned fastq files for the respective sample in the folder, with the filename starting with sampleID and containing READ1 or R1 and READ2 or R2 (in case of paired-end data). the fastq files should also be unzipped.

BenKuhnhaeuser commented 3 years ago

Hi Tobias,

Thank you for the swift response. This was indeed just about the directory structure (and file endings, too). The example directory has been extremely helpful, and it now has successfully completed building contigs. I think it would be very beneficial if you put the required folder structure and file names for each step into the documentation. Every pipeline has its own naming conventions, and it's almost impossible to anticipate what they will be.

With the files not the right directory structure, I got the following error: FileNotFoundError: [Errno 2] No such file or directory: '/beegfs/scratch/scratchFS/users_area/bk12kg/analyses/calamoideae/secapr_test/trimmed_reads/sample_stats.txt'. That's what got me confused. Maybe the error message can be amended?

All the best, Ben

mustafaraza1987 commented 3 years ago

Hi Tobias, secapr reference_assembly --reads 00_raw --reference_type sample-specific --reference 03_alignments_len100 --output 04_reference_assembly --min_coverage 3 ########################################### I am running this command and got the error in both (sample-specific, alignment-consensus). Creating reference library for CUC_Blastania sh: /bin/cat: Argument list too long

################################################## Processing sample CUC_Blastania Mapping... Converting to bam... Indexing bam... Removing duplicate reads with samtools... mv: rename /Users/mustafa/Desktop/refrence_based/04_reference_assembly/CUC_Blastania_remapped/including_duplicate_reads/_no_dupls_sorted.bam to /Users/mustafa/Desktop/refrence_based/04_reference_assembly/CUC_Blastania_remapped/_no_dupls_sorted.bam: No such file or directory Indexing duplicate-free bam... [E::hts_open_format] Failed to open file /Users/mustafa/Desktop/refrence_based/04_reference_assembly/CUC_Blastania_remapped/CUC_Blastania_no_dupls_sorted.bam samtools index: failed to open "/Users/mustafa/Desktop/refrence_based/04_reference_assembly/CUC_Blastania_remapped/CUC_Blastania_no_dupls_sorted.bam": No such file or directory Generating a consensus sequence from bam-file... Generating a consensus sequence from bam-file... Creating reference library for ANI_Combretocarpus sh: /bin/cat: Argument list too long ######################################### Unable to solve the error, please suggest.