genomeannotation / GAG

Generates an NCBI .tbl file of annotations on a genome.
MIT License
64 stars 20 forks source link

Docs #128

Closed bruab closed 10 years ago

bruab commented 10 years ago

Here are all the additions/changes that need to happen to the docs, from @smg283 :

~~1. List what features GAG reads from GFF3 and what features are ignored (or at least what is read). Maybe explain how you could include ignored features back in your gff3 after the fact, and if you have a feature that you want to keep, contact the mailing list to ask to get it added. ~~ 2. Suggest creating a symbolic link to the genome.fasta and gff file (rather than renaming), teach good methods 3. For what is a genome or fasta file section, say you can supply a scaffolded or unscaffolded assembly file (although you would suggest scaffolded assembly) 4. GFF file (be clear that it is a GFF3 file (not GFF2)). Maybe even change to require genome.gff3 (someone inevitably will have a gff2 file), or repeatedly say gff3 ~~5. Citing GAG, put your name first and my name last if that is ok with you ~~ 6. Read in genome. Explain that "." is your current folder, if you are not in current folder give direct or relative path to genome folder (so people know what "." means) 7. Removing Terminal "N'". State somethign like "depending on the assembler you used, your assembly may contain terminal "n'S". Also s/read/red/ ~~8. More terminal "N's" : State that we only remove Ns at end of scaffolds, but not low quality bases within a scaffolded assembly. ~~ ~~9. More terminal Ns: You say: Exit and have a look at the results. If you open the file 'terminal_ns_removed/genome.fasta' you can see that it contains no 'N's. Additionally, the indices of each feature in 'terminal_ns_removed/genome.gff' are updated to correspond to the new, cleaned up sequence. You might want to reword to say that there are no N's at the beginning or end of the seqeunce (to better explain what it is doing). The sequence has been shortened, and the coordinates have been updated (maybe use info to show this?)~~ ~~10. Introns: you say: It's pretty common to find genome assemblies with shorter introns, though. Luckily, GAG can remove the for you. Reword to : It's pretty common to find automated annotations generating shorter introns, though. Luckily, GAG can remove or flag them for you.~~

~~11: I think you need to do a start and stop command. If I am not mistaken, you need to run start and stop if they aren't explicitly listed in the gff3 file (which they often are not). Otherwise, gag will list them as partials. This is a big thing, that will cause errors for folks if they don't run (and then can check in "info" to confrim). ~~

~~12: Eventually add more info on Annotations. Maybe display a bit of a gff3 file, where the gene name should be in attributes, where the protein product should be (do we handle this now?) and the DBxref/ontology. then show how it ends up in the tbl. ~~

13. I would state that you can remove or flag a little more up front (like maybe min_intron section), maybe put full list a bit higher in the manual, or a link down the page to the table in the first paragraph (for a full list of commands click here).

~~14. Really overstate the python 2 thing, maybe even give link to download python2. ~~

  1. Explain how to reset a remove/flag operation.