Open ShaiberAlon opened 7 years ago
My mistake! I see that there is an output '*distanceTable'. Sorry for that.
But I see that all the distances I got are either 1.0 or 0.0 does that make sense? Also, is there a way to find out which contigs ended up in which scaffold? And also, if I understand correctly the file '*network.gexf' has the orientation for each edge. Is there a way to also find, for each contig, to which strand it belongs (i.e. was the sequence of the contig converted to the reverse complement by MeDuSa)?
Hi
But I see that all the distances I got are either 1.0 or 0.0 does that make sense?
can you paste the command line used here?
Also, is there a way to find out which contigs ended up in which scaffold? And also, if I understand correctly the file '*network.gexf' has the orientation for each edge. Is there a way to also find, for each contig, to which strand it belongs (i.e. was the sequence of the contig converted to the reverse complement by MeDuSa)?
I have to do that... I could merge both information in a single file, could you please tell me what do you think is the best way to do it? I'm looking at http://www.ebi.ac.uk/ena for usable formats
Hi,
Thank you very much for your quick response!
The command I used is:
java -jar medusa.jar -f 01_FASTA/ -i p214_sequence-FASTA-estimated-distance.fa -d -v -gexf -o p214_sequence-FASTA-medusa-fixed-estimated-distance.fa
If you wish, I could also provide you with the files I used (but notice that I used 106 reference genomes).
As for the format of exporting the information on contig arrangement, I think that a table with the full information on the contigs would be best. I would suggest a table, where each row corresponds to the input fasta file, and with the following columns: contig_number
, contig_name
, scaffold_name
, contig_orientation
, contig_reverse_complimented
Where:
contig_number
: just a serial number from 1-N (N - the number of contigs in the input) according to the arrangement of the contigs in the final output fasta file
contig_name
: the name of the contig in the input fasta file
scaffold_name
: the name of the scaffold, in the output fasta, that contains the contig
contig_orientation
: 1 or -1 according to whether the orientation of the contig was kept or changed.
contig_reverse_complimented
: 1 or -1 according to whether the sequence nucleotides were kept or complimented.
In addition, if distance estimation is performed (-d
), then I would suggest adding another column estimated_distance_to_next_contig
(where the contigs at the end of a scaffold would just have a NaN
or NA
).
I hope this is helpful!
Hello, in addition to what @ShaiberAlon mentioned, coordinates of contig x in scaffold y (start, end) would be very helpful.
So maybe you could expand your output by a distinct and parsable (.tsv) file containing one line for each input contig:
contig_number, contig_name, scaffold_name, contig_orientation, contig_start, contig_end
Thanks a lot for this excellent tool!
When using the
-d
option (for distance estimation) could there be a way to get the distance estimation as an output (for example in the SUMMARY file)?