BrendelGroup / AEGeAn

Integrated toolkit for analysis and evaluation of annotated genomes
http://brendelgroup.github.io/AEGeAn
ISC License
24 stars 10 forks source link

ParsEval attempts to infer UTRs when only CDS is provided #71

Closed standage closed 11 years ago

standage commented 11 years ago

When only the CDS is provided explicitly (no exon or UTR features), ParsEval attempts to infer UTRs. Of course it fails, since without exons explicitly provided, there is no way to infer UTRs (whether CDS is provided as CDS features or as start/stop codons).

ckuanglim commented 11 years ago

My reference GFF3 input has "CDS" but doesn't has "exon" and "intron". Can I infer only exons and introns? UTR is not important to me.

standage commented 11 years ago

@ckuanglim Did this not fix the problem?

standage commented 11 years ago

Unfortunately there is not much I can do with a phrase like "fail to make AgnInferCDSVisitor.c"--I could be much more helpful if you shared the error message generated by the compiler.

Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University

On Mon, Sep 9, 2013 at 10:00 PM, ckuanglim notifications@github.com wrote:

fail to make AgnInferCDSVisitor.c

— Reply to this email directly or view it on GitHubhttps://github.com/standage/AEGeAn/issues/71#issuecomment-24128830 .

ckuanglim commented 11 years ago

I replace the file 'AgnInferCDSVisitor.c', and re-make it. Successful.

When I run parseval again, lots of files (file name ends with the scaffold name) with zero size created in the working dir. Then, it exit with many "warning: CDS for mRNA 'XXXXX' has length of XXX, not a multiple of 3".

standage commented 11 years ago

I would suggest using git to download AEGeAn. This will make it much easier to integrate new updates as I post them. The docs include instructions for downloading AEGeAn with git. Once a new version is available, you can update by running git pull origin master in the AEGeAn directory.

The CDS length warning message shouldn't be serious. Are there any other messages? I understand you may have restrictions on sharing data, but it would be much easier to troubleshoot if I had access to the data files with which you are working--even if it's only a subset.

Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University

On Mon, Sep 9, 2013 at 10:20 PM, ckuanglim notifications@github.com wrote:

warning: CDS for mRNA 'XXXXX' has length of XXX, not a miultiple of 3

Lots of this warning message in STDERR, is this ok?

— Reply to this email directly or view it on GitHubhttps://github.com/standage/AEGeAn/issues/71#issuecomment-24129519 .

ckuanglim commented 11 years ago

When I use small set GFF3 (only 3 scaffolds for reference and prediction), parseval run successful. But when I use the full set GFF3 (with ~40k scaffolds), then this exit with no comparison result. Do you think this might cause by the array size limit or something?

standage commented 11 years ago

No, array sizes should not be an issue.

I don't know what you mean by "exit with no comparison result"--did the program print any relevant warning/error messages? Did the program complete successfully and simply say "no loci to report", or did it crash? I am eager to help you resolve this issue, but you have to give me more information to work with.

standage commented 11 years ago

I just updated the code based on a different issue, but based on your previous report it may be related. See issue #64.

ckuanglim commented 11 years ago

After re-make AEGeAn, the problem still same. My command: "parseval -c 0 -t 0 -n 8 -g -v -w -o txt R.gff3 P.gff3 1>out 2>err"

Situation A: I use small set GFF3 (1000 scaffolds), parseval run successful.

Situation B: I use the complete set GFF3 (~40k scaffolds). When the program finished, (1) "txt" and "out" are empty. (2) 1020 files (file name are "txt.scaffoldXXXX") created in the working dir. All the 1020 files are empty. (3) Lots of warning in "err", showing "warning:CDS for mRNA 'XXXXXXX' has length of XXX, not a multiple of 3". (4) another warning in "err" is "warning: max number of open files is 1024, but there are 4156 sequences to be compared; if ParsEval crashes, this is probably why; use 'ulimit -S -n $newlimit' to adjust this setting". (5) at the end of "err" show "error: could not open 'txt.scaffold01091' (Too many open files)." \ Then, I change my resource limit with "ulimit -S -n 50000". Finally, parseval run successful.

This is a limitation on my side. I am sorry to trouble you. Thank you.

standage commented 11 years ago

I knew the "too many open files" issue was a problem, but I had not implemented very good warning messages. I appreciate your feedback and your patience. Please feel free to open another thread if you encounter any other issues with AEGeAn.