Magdoll / Cogent

Coding Genome Reconstruction using Iso-Seq data
BSD 3-Clause Clear License
60 stars 17 forks source link

reconstruct_contig.py output files #43

Closed Zunaick closed 6 years ago

Zunaick commented 6 years ago

Hi Magdoll

I was running reconstruct_contig.py for each family and have obtained four fasta files as listed for each family.

in.trimmed.fa cogent.fa cogent2.renamed.fasta cogent2.fa

I checked number of sequences in output for few families and found that they are same except for cogent.fa file. So, i was thinking which one is the final reconstructed contig file (as in the cases i checked number of unique transcripts in cogent2.fa is same as in in.fa file). It will be really helpful if you can help me understanding these output files.

Thank you

Magdoll commented 6 years ago

Hi @Zunaick ,

in.fa is the input, but it is first trimmed of all flanking lower case bases to become in.trimmed.fa which is then run through Cogent.

cogent2.fa is the output.

If the number of input (in.trimmed.fa) and output (cogent2.fa) is identical, it means no reconstruction was possible. However, this should not happen for every single one of the gene families. In most cases Cogent can find some shared exons between the input transcript sequences.

If you would like me to take a look at a few examples, please give me an email so I can request a file upload.

--Liz

ClearloveMiao commented 6 years ago

Hi! And the file we have to use next step is cogent2.renamed.fasta? The other file can be deleted? Thanks!

Magdoll commented 6 years ago

Hi @ClearloveMiao ,

You can either use cogent2.fa or cogent2.renamed.fasta. The only difference is the sequence ID.

In cogent2.fa the ID only has the path names:

>path0
>path5
...

In cogent2.renamed.fasta a prefix is added so it becomes unique:

>human_0|path0
>human_0|path5
...

If you intend to collect all the output from all gene families, you should use cogent2.renamed.fasta so each sequence has an unique ID.

--Liz

Zunaick commented 6 years ago

Thanks Magdoll

I would like to share few gene families file with you. Please can you provide me with the email-id on which i need to upload the files. Also, i would like to mention that i got warning as mentioned below while executing the script.

/nfsroot/home/user/.local/lib/python2.7/site-packages/Bio/Seq.py:354: BiopythonDeprecationWarning: This method is obsolete; please use str(my_seq) instead of my_seq.tostring(). BiopythonDeprecationWarning)

I ignored it as it was a warning and not error. So is it fine.

Magdoll commented 6 years ago

Hi @Zunaick ,

  1. I need your email to send a private request for file upload :-)

  2. The warning message is harmless and can be safely ignored.

-Liz

Magdoll commented 6 years ago

Closed until further notice from OP.