biocorecrg / master_of_pores

Nextflow pipeline for analysis of direct RNA Nanopore reads
https://biocorecrg.github.io/master_of_pores/
MIT License
88 stars 16 forks source link

Counts feature error during Nanopreprocess run #97

Closed Maheen94 closed 2 years ago

Maheen94 commented 3 years ago

Hi, I'm using master of pores for RNA seq data analysis. I used reference and annotation files from Gencode. However, I get the following error consistently at the counts step. Would really appreciate any assistance. Thanks!!

Maheen

Screen Shot 2021-05-31 at 4 30 52 PM
lucacozzuto commented 3 years ago

Hi, it seems the file is not accessible. Can you reach the working folder indicated and check wether the link to the annotation file is working?

Luca

Maheen94 commented 3 years ago

Hi,

So, I uploaded the ref genome (fasta) and cooresponding annotation file (GTF) from Gencode using following commands in anno folder:

wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/GRCh38.p13.genome.fa.gz wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf.gz

The alignment works fine but its specifically the counts function which throws an error. I also tried using other annotation files from Gencode but the same problem persisted.

Maheen

lucacozzuto commented 3 years ago

Mmm... Can you try unzipping the GTF file before? I'll double-check if this is a bug. Thanks for letting me know!

Maheen94 commented 3 years ago

Sure, will try unzipping the file first. Will keep you updated.

Maheen94 commented 3 years ago

Tried rerunning it with unzipped file but ran into the same error:

Also, it says "Error occurred when processing GFF file". I always use GTF format and didn't run into issues. I'm wondering if I should use GFF3 format instead?

Command exit status: 1

**Command output: (empty)

Command error: WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Error occured when processing GFF file (line 1 of file /home/ubuntu/environment/master_of_pores/NanoPreprocess/../anno/gencode.v38.chr_patch_hapl_scaff.annotation.gtf): [Errno 2] No such file or directory: '/home/ubuntu/environment/master_of_pores/NanoPreprocess/../anno/gencode.v38.chr_patch_haplscaff.annotation.gtf' [Exception type: FileNotFoundError, raised in init.py:47]**

lucacozzuto commented 3 years ago

Well no. That process is using htseq-count that is able to read gtf files. So I don't understand why you have this error. But again I see a "file not found". So can you go to that temporary folder and try to see if the link is ok?

Maheen94 commented 3 years ago

The link is actually ok. Just double checked it

lucacozzuto commented 3 years ago

Ok. So you can run that command just by doing

singularity exec -e NAMEOFIMAGE .command.sh 

the name of the image can be retrieved by doing a

grep singu .command.run

Maheen94 commented 3 years ago

Command 'singularity' not found, but can be installed with:

I'm using docker, I normally start my pipeline with the following command: nextflow run nanopreprocess.nf -with-docker

lucacozzuto commented 3 years ago

Aha. Well, maybe there is a problem with the mounting of volumes and docker. So grep docker inside the .command.run you will find the command to use.

Maheen94 commented 3 years ago

Hi, So, I changed some parameters of params.config. Previous: ref_type "genome" (theoretically this makes sense since I'm using genome ref) Changed: ref_type "transcriptome" (It ran smoothly on this setting even though I used the same ref genome uploaded in the anno folder using the wget command)

lucacozzuto commented 3 years ago

Changing to transcriptome will change the tool for counting. So something weird is happening with htseq-count tool. Did you solve it?

ash-kh commented 3 years ago

I have the same error too, tried it with both gz and gunziped format. The command works well when I run it using the systems htseq but fails when using the singularity container.

The transcriptome mapping works

lucacozzuto commented 3 years ago

Ouch. Can you send me a single fast5 file for trying?

lucacozzuto commented 3 years ago

my email il luca.cozzuto /at/ crg . eu

ash-kh commented 3 years ago

Thanks for the super quick response!

I tried it with the test data provided as well and get the same error.

Best, Ashkan

On Aug 26, 2021, at 12:21 PM, Luca Cozzuto @.***> wrote:

 my email il luca.cozzuto /at/ crg . eu

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

lucacozzuto commented 3 years ago

Hi, the test dataset is transcriptome. So, it won't use htseq-count (that needs the GTF annotation). If you have a single fast5 file and tell me which genome and annotation is, I'll give it a try and debug

Maheen94 commented 3 years ago

Hi For me the same error persisted. Only the transcriptiome mapping works.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Luca Cozzuto @.> Sent: Thursday, August 26, 2021 6:47:23 AM To: biocorecrg/master_of_pores @.> Cc: Batool, Syeda Maheen @.>; Author @.> Subject: Re: [biocorecrg/master_of_pores] Counts feature error during Nanopreprocess run (#97)

    External Email - Use Caution

Hi, the test dataset is transcriptome. So, it won't use htseq-count (that needs the GTF annotation). If you have a single fast5 file and tell me which genome and annotation is, I'll give it a try and debug

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://secure-web.cisco.com/1AYafHv_lQqbSO5e82dZVqLfb5fT7elTOLi7RkvhWg2y6jDzATuxAyGWbIwj96JqscGsRUsBNjeoOsn-zekLePg31nPYCS2crID6lvWBQsw_lAtjZuYncW0dfQYkab5cJXUHjRhXrYJORn8I7QyXdrlnqvvpEvOcIl1qzJOeKbDBjhzjrLV7cDTABURixRV2LLAgLsAdT5i28Wg1W8Mq-r2Bz4jlkjERHlZboF7cWfhjfVAaEgGBgE9Wo517zaE0g/https%3A%2F%2Fgithub.com%2Fbiocorecrg%2Fmaster_of_pores%2Fissues%2F97%23issuecomment-906296630, or unsubscribehttps://secure-web.cisco.com/1Tgc_oKeTdyUn23T26GAFETSLS54A2eX5DRuaLhuZYsxQI03vIqJCijHxg0YeLZ-XTcORzw4SU7f_kIYtC8Kj9-8cgETy5G9GlSv4IRxunxO2k8TyllWGG1ojKZFXF7_g8JQYFp0lQlvcBigxZypvS4ULDLt_SMIZpT1UL3rRJ5muTRpUDJV3gh_Ss22Yn5VRo0ebsPJPfJL80ziQlNZJSrRBHPMSBAc_8QGh_3fkq-e5_7icGYw_YRKY83NdmF0P/https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAT3SK7HUMZ75GSKVPAWEXX3T6YLTXANCNFSM453NXRYQ. Triage notifications on the go with GitHub Mobile for iOShttps://secure-web.cisco.com/1KONwHQvypI6dxlh4tVQ-JZ5Lq24lzHkyda2b1pnGVc4aLLAgvYdYHXfqI7jTym7D8nEgGC9H9hg74zmJJvX9Ov2ou1w45RB7BeR5aDXnDwIZkvkyGGlaMHjn1tlgl_NI-G7ql01eez87AdFQF90HI0jjSd9oYXiWEl0QwzqIw7YWxne95eocn7dP6X8jOtDQ2ciso6TWa1AjNq1cbkV5g1UKog2qEuFlXX6FVZUcfiOLY8N4IM-HHQ-f2wx_vbFt/https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675 or Androidhttps://secure-web.cisco.com/1nOh9JOptUmnqAqddsHL_Qg_keAbsc4Rh3PlTL84gmYu9B2Cvja2E6ATcCpTLSssAm8SQet2kDjVgGYdPJxEbKiV-MvKOEO-_Yz0MlQrLjgXipsqsX2KShgdlMfcuwBAS-2mEmPc1VhxYJtFIx3Y6UQTYD-9fr-6DRxHp9kpGHFl5Oi8AoZ-US3r0OMoX9Xz_kFkhbQyhdLPIINop1qm-zjYkDhpM8qp5A7lTZxZwdKVwROu-O_D0CeTApO2z0fkq/https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26utm_campaign%3Dnotification-email. The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at http://www.massgeneralbrigham.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.

ash-kh commented 3 years ago

I sent you a mail

lucacozzuto commented 3 years ago

Hi, I replied to the mail. I was unable to reproduce the error. I have singularity version 3.2.1

and using this param

params { kit = "SQK-RNA001" flowcell = "FLO-MIN106" fast5 = "$baseDir/../data/input/*.fast5" reference = "$baseDir/../anno/GRCh38.primary_assembly.genome.fa.gz" annotation = "$baseDir/../anno/gencode.v38.annotation.gtf" ref_type = "genome"

seq_type            = "RNA"
output              = "$baseDir/output"
qualityqc           = 5
granularity         = ""

basecaller          = "guppy"
basecaller_opt      = ""
GPU                 = "OFF"
demultiplexing      = ""
demultiplexing_opt  = ""
demulti_fast5       = "OFF"

filter              = ""
filter_opt          = ""

mapper              = "minimap2"
mapper_opt          = ""
map_type            = "spliced"

counter             = "YES"
counter_opt         = ""

variant_caller      = "NO"
variant_opt         = ""

downsampling        = ""

email               = "" 
}
lucacozzuto commented 2 years ago

Hi, it looks like htseq has a problem with some long-read mappers. We fix this in the version of master of pores:

https://github.com/biocorecrg/MOP2