carolzhou / multiPhATE2

multiPhATE with comparative genomics
18 stars 10 forks source link

Functional Annotation Only - continued #18

Closed rfcohen closed 3 years ago

rfcohen commented 3 years ago

Thanks for all your help. I love this tool.

Following on from issue 16.

I have a custom set of gene calls that were hand curated. What should that file be named and where should it go? Then what should the proper settings in the config file be to use this as the input to the annotation engine? I have this now: filename is PipelineInput/phate_custom.gff

Config file settings are:

custom_gene_calls='true' custom_gene_caller_name='custom' primary_calls='custom' And the error is this:

[Errno 2] No such file or directory: '/home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs/custom.cgc' primary calls file, /home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs/custom.cgc phate_sequenceAnnotation_main says, ERROR: Check the formats of your input file(s): genome file is /home/rcohen/Documents/multiPhATE2/PipelineInput/EcCH94Phi94_contigs.fasta primary gene call file is /home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs/custom.cgc outfile is /home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs/phate_sequenceAnnotation_main.out gfffile is /home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs/phate_sequenceAnnotation_main.gff

Thanks.

carolzhou commented 3 years ago

Name your custom gene-call file as: myGenome.custom.gff, where myGenome is the same as the output subdirectory name (which is best named also the same as the genome name in your genome's fasta file: myGenome.fasta). Place your custom gene-call file in the PipelineInput/ directory. The code will recognize the file by its name and move it to the corresponding output subdirectory. If you are running more than one genome, then you need a custom gene-call file for each genome. GFF format of the custom gene-call file is like that which Prodigal produces. Ignore the custom_gene_caller_name configuration parameter as it was not useful and I removed it from the sample configuration file.

rfcohen commented 3 years ago

Thanks. This helps a lot. Still have an error with the .cgc file.

[Errno 2] No such file or directory: '/home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs_test/custom.cgc' primary calls file, /home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs_test/custom.cgc phate_sequenceAnnotation_main says, ERROR: Check the formats of your input file(s): genome file is /home/rcohen/Documents/multiPhATE2/PipelineInput/EcCH94Phi94_contigs.fasta primary gene call file is /home/rcohen/Documents/multiPhATE2/PipelineOutput/EcCH94Phi94_contigs_test/custom.cgc

There is no custom .cgc file created. Where does it come from?

Any guidance would be greatly appreciated.

carolzhou commented 3 years ago

Please post your custom gene-call file, or send to me via email, if you prefer.

rfcohen commented 3 years ago

Thx. Would be happy to email the gene file. What’s the best email address?

carolzhou commented 3 years ago

multiphate@gmail.com

rfcohen commented 3 years ago

Thank you. I emailed the files.

-Rob

On Jan 16, 2021, at 4:22 PM, Carol Zhou notifications@github.com wrote:

 multiphate@gmail.com

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

carolzhou commented 3 years ago

Hi Rob, Your custom gene-call file has 8 columns--should be 9. It is missing the "score" column, which occurs after the start/stop columns and before the strand (+/-).

rfcohen commented 3 years ago

Thank you. Will check it out.

But that doesn’t explain why the prodigal file produced the same error. If the input should be modeled like the prodigal file and I used the output of prodigal as the input as a test, it should have worked?

-Rob

On Jan 16, 2021, at 11:46 PM, Carol Zhou notifications@github.com wrote:

 Hi Rob, Your custom gene-call file has 8 columns--should be 9. It is missing the "score" column, which occurs after the start/stop columns and before the strand (+/-).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

carolzhou commented 3 years ago

Issue related to use of checkpoints. Code appears to be functioning. Thank you for using multiPhATE2 ! :-)