Shamir-Lab / Recycler

This is the codebase for Recycler, described in our manuscript: https://academic.oup.com/bioinformatics/article/33/4/475/2623362, by Roye Rozov, Aya Brown Kav, David Bogumil, Naama Shterzer, Eran Halperin, Itzhak Mizrahi, and Ron Shamir
BSD 3-Clause "New" or "Revised" License
58 stars 7 forks source link

output sequence fasta #11

Closed EmilyHoedt closed 7 years ago

EmilyHoedt commented 7 years ago

Hello, I would like to know if the .cycs.fasta file is meant to contain the sequences for each of the predicted plasmids of a given genome and if not how do you obtain this?

I've tried: python recycle.py -g assembly_graph.fastg -k 77 -b p9.SA7518_S79_R1_001.bam -i True > p9.cycs.fasta

With my current output being: 154.187 2112.15358778 312.39175 ================== path, coverage levels when added ==================== 6 nodes remain in component

2 nodes remain in component

('EDGE_134_length_79_cov_402.5', 'EDGE_110_length_91_cov_243.929', 'EDGE_61_length_58956_cov_140.228') before [402.5, 243.929, 140.228] after [262.10192547616219, 103.5309254761622, 0] 474 nodes remain in component

('EDGE_270_length_49661_cov_147.474', 'EDGE_201_length_95_cov_266.333', 'EDGE_134_length_79_cov_402.5', 'EDGE_109_length_112_cov_164.886', 'EDGE_242_length_92_cov_256.467', 'EDGE_90_length_33867_cov_154.374', 'EDGE_278_length_203_cov_192.841', 'EDGE_181_length_13154_cov_136.653', "EDGE_269_length_103_cov_258.308'") before [147.474, 266.333, 262.10192547616219, 164.886, 256.467, 154.374, 192.841, 136.653, 258.308] after [0, 117.66025606000321, 113.42918153616537, 16.213256060003175, 107.79425606000316, 5.7012560600031748, 44.168256060003188, 0, 109.63525606000317] 470 nodes remain in component

470 nodes remain in component

6 nodes remain in component

15 nodes remain in component

15 nodes remain in component

2 nodes remain in component

==================final_paths identities after updates: ================ ('EDGE_270_length_49661_cov_147.474', 'EDGE_201_length_95_cov_266.333', 'EDGE_134_length_79_cov_402.5', 'EDGE_109_length_112_cov_164.886', 'EDGE_242_length_92_cov_256.467', 'EDGE_90_length_33867_cov_154.374', 'EDGE_278_length_203_cov_192.841', 'EDGE_181_length_13154_cov_136.653', "EDGE_269_length_103_cov_258.308'")

('EDGE_134_length_79_cov_402.5', 'EDGE_110_length_91_cov_243.929', 'EDGE_61_length_58956_cov_140.228')

('EDGE_88_length_3692_cov_10938.4',)

("EDGE_13_length_2024_cov_1430.94'",)

Thank you, Emily

rozovr commented 7 years ago

Try running again without the pipe in your command, '> p9.cycs.fasta'

It looks like you are piping Recycler's run-time printing to STDOUT with the suffix of your command, '> p9.cycs.fasta'. My guess is that you thought this pipe is necessary to create the fasta output, but in fact it is created automatically - no pipe necessary. I don't recall what the default prefix is, but there should be a file created with the suffix '.cycs.fasta' that contains the plasmid sequences, unless your piping is overwriting it.

EmilyHoedt commented 7 years ago

Hi, Thanks for replying. I first tried running without '> .cycs.fasta' but the text I showed above just prints to the screen with no output files generated....

rozovr commented 7 years ago

The output file should be called assembly_graph.cycs.fasta and should be in the same directory as the input file assembly_graph.fastg. Please check again

rozovr commented 7 years ago

@EmilyHoedt did you locate the outputs?

EmilyHoedt commented 7 years ago

Sorry yes I found it in the same location as my input. I just assumed it would write to my working directory. Thank you for your help!