biocorecrg / MOP2

Master of Pores 2
https://biocorecrg.github.io/MOP2/docs/
MIT License
23 stars 7 forks source link

mop_preprocess problem #59

Open ddioken opened 8 months ago

ddioken commented 8 months ago

Hi,

I need help with two problems I'm having with your tool on my M1 Mac. Here's what's going on:

Problem with Minimap2: I intended to use Minimap2. When i start the tool using minimap2, it gave me an error message:

[25/98f82d] Submitted process > preprocess_simple:FASTQC:fastQC (starvation_cDNA.fastq) ERROR ~ Error executing process > 'preprocess_simple:MINIMAP2:map (e2_45min_cDNA)'

Caused by: Process preprocess_simple:MINIMAP2:map (e2_45min_cDNA) terminated with an error exit status (1)

Command executed:

minimap2 -t 1 -a -uf -ax splice -k14 Homo_sapiens.GRCh37.cdna.fa e2_45min_cDNA.fastq | samtools view -@ 1 -F4 -hSb - > e2_45min_cDNA.bam

Command exit status: 1

Command output: (empty)

Command error: [main_samview] fail to read the header from "-".

It didn't tell me much, just that it couldn't read something it needed.

So, I quit and wanted to run it using bwa aligner. Issue with BWA and Nanoplot: I switched to using BWA because Minimap2 wasn't working. With BWA, I got all my files like BAM and BAM.BAI. But then, it did not finish the run because of a problem. When I open the docker, after creating all other files (as bam, bam bai, fastqc, counts, cram etc), in the nanoplot step, it says it cannot be finished because of the error:

2024-03-07 08:20:28 Matplotlib created a temporary config/cache directory at /tmp/matplotlib-qj3mdxlk because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. 2024-03-07 08:20:30 [E::idx_find_and_load] Could not retrieve index file for 'e2_45min_cDNA_s.bam'.

But I checked, and the index file is there, in the input folder itself.

I've already done the basecalling with another tool called Guppy and was just trying to use my fastq files with your tool.

Can you help me figure out what's wrong? Thank you for making the tool. It seems very useful. Hope I can run it!

lucacozzuto commented 8 months ago

Hi, can you send me the log of your first RUN please?

ddioken commented 8 months ago

Hey! Thank you for the response! I attached the log.

I also tried to run minimap2 seperately and it worked.

> (base) didemdkn@Didems-MBP mop_preprocess % minimap2 -t 1 -a -uf -ax splice -k14 /Users/didemdkn/Downloads/Homo_sapiens.GRCh37.cdna.fa /Users/didemdkn/Downloads/strvs45fastq3/str/combined_str.fastq > minimap2_output.sam

[M::mm_idx_gen::7.383*0.98] collected minimizers
[M::mm_idx_gen::12.194*0.97] sorted minimizers
[M::main::12.208*0.97] loaded/built the index for 180253 target sequence(s)
[M::mm_mapopt_update::12.520*0.97] mid_occ = 142
[M::mm_idx_stat] kmer size: 14; skip: 5; is_hpc: 0; #seq: 180253
[M::mm_idx_stat::12.679*0.97] distinct minimizers: 19969680 (30.84% are singletons); average occurrences: 4.822; average spacing: 2.982; total length: 287163541
[M::worker_pipeline::454.439*1.00] mapped 427041 sequences
[M::worker_pipeline::918.263*1.00] mapped 415536 sequences
[M::worker_pipeline::1260.503*1.00] mapped 307798 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 1 -a -uf -ax splice -k14 /Users/didemdkn/Downloads/Homo_sapiens.GRCh37.cdna.fa /Users/didemdkn/Downloads/strvs45fastq3/str/combined_str.fastq
[M::main] Real time: 1260.572 sec; CPU: 1257.124 sec; Peak RSS: 2.710 GB

I did not understand why it does not work in the other case. I also checked my fastq files and it's looking alright:

> (base) didemdkn@Didems-MBP str % head combined_str.fastq
> @8d0e28c5-ec55-4df1-9121-56c64dace674 runid=95ca060123d06576dc7f2f21c526a5c30e57a51c sampleid=mcf7stv030821 read=14 ch=260 start_time=2021-08-03T11:41:48Z
> GCCAUGGCCAAGAGAGGGCCCACCAGAAACGCAGCAGCAAACGGGCCCUAGAUGGACUGGAGCAAGAAAAACGAACUCUUCAGCUCCUCUGAGGUGCCCUGCUGCACCCAGAGGUGAUGCAGGGCCGAGCCAGCAUUCCACCCCACCUUUUCCACCCCCAAUUACUCCCUGAAUCGCCGUACAAAUCAGCACCCACAUCCCCUCUUGACAAAUGAUUUCUGGAGAACAUGUUUCCUGACUUUCAGGGAAGGUGAAUGCGUGCUUCCCGUCCUCCCGCAGUCAGAAAGGAGACUCUGCCUCCCUCCCUUGAGUGCCACACCUACCGGGUGUCCCUUUGCCACCCUGCCUGGACAUCGCUGGAACCUGCACAUAUGCCAGGAUCAUGGGACCAGGCGAGAGGGCACCCUCCUCCUCCCAUGUGAUAAUAGGGUUCCAGGGCUGAUCAGAACCUGAUUGCAGAACUGCCGCUCUCGGUGAUGGGCAUACGUUAUCCUGAGACCUGUGGCAGACACGUCUUGUCUUCAUGAUUCUGUUAAGAGUGCAGUAUUAAGAGUCAUUGAGGAAAUUUGUCUCGUGAUUAACAUGAUUUCCUGGUUGUCUACACCAGGGUCGGCAGUGGCCCAGCCUUAAACUUUGUUCCUACUCCCACCCUCUCAGCGAACUGGGUCGGAUGAGGAGGGUUUGGCUACCUCCCCCUGCCCAUCCCUGAGCCAGGUACCACCAUUGUCAAGGAAACACUUUCAGAAAUCAGCUGGUUCCUCCAAAAU
> +
> ,*/,-?=:74?@>4?&LG<(,$-3=??7886';<82<4?<<:277:7%$(0),,0&3/&$&&)+?;7==5788985/&.056/3(45&%**&-$%%17=<91661',5837'$%))(%%&'*;@/+40897165)*10756>5+%+%,&('&&$()341&++63'%08,66=?+##')).8/7:,;:AC=;662154.%542,%);8@665'49;74-%2+(6A?9-)0/2(*+=A>>0,*(&'##%(),+2906*%))07%&&)350%(%$'/($&%9+:A<A?D;5<@<('%+126).),*./2'67'$*)-)).'66365.'44,0+%12.)33<;.2>/-6,-+,,%.,'%$'9<4.+,%57;3+&&8<:2-42/A>-)%*422881@A;1=?3=9;<;77+*32-,.',,(''&;=;042.#&,,2)&82>662:8=-(88670'-0(%(&'&&==>53588'2(%4*%8=6%)&&+,*)((/9:9.;>=:)4<:3()%$%;A>DC@9@4/@40%'-,'$$'2*'65607C?>>>;>*6-'..%54979.--;<-445>$/-<:2$#$"(.1B5(%$&%777,+921207**.$&%&&195513'%&%+24<4?5%++01;(+;993**<8833''$4248;),.0$&#10$$.0,0:*00$%-(0%0'=8C>>=A@%=<<C66)&'.3=;36@D>6)688=9=8864/+2)*'#,4.+-/&%9:05=;:<91877=E@>B:.4655,84;3&2&.=;6,,)
> 

log240314_2.txt

lucacozzuto commented 8 months ago

Hi, I cannot read the command line from your log. Can you send it to me please? The minimap and samtools used are in our docker image? I did not checked the M2 processor but hopefully soon I'll have one

ddioken commented 8 months ago

(base) didemdkn@Didems-MBP mop_preprocess % nextflow run mop_preprocess.nf -with-docker -bg -profile m1mac --fast5 " " --fastq "/Users/didemdkn/Downloads/strvs45fastq3/*/.fastq" --reference "/Users/didemdkn/Downloads/Homo_sapiens.GRCh37.cdna.fa" --annotation "/Users/didemdkn/Downloads/Homo_sapiens.GRCh37.gtf" --output "/Users/didemdkn/Downloads/strvs45fastqpreprocess" --ref_type transcriptome --mapping minimap2 --counting nanocount --saveSpace YES > log240314_2.txt

this is my command line. i checked it many times and it's looking ok to me but i couldn't find the problem :/

lucacozzuto commented 8 months ago

Hi. Just checking your minimap command line... I think you don't need the parameters for splicing since is cDNA, no? Try to choose the right parameters (they are stored in the *tools_opt.tsv file indicated in the params.config file