cpockrandt / genmap

GenMap - Fast and Exact Computation of Genome Mappability
Other
100 stars 18 forks source link

better explanation of cvs output #15

Closed duartemolha closed 4 years ago

duartemolha commented 4 years ago

I was trying to find in the help documentation , but failed to do so.

Can you please explain the output of the CSV file when using -d option?

Thanks

Duarte

duartemolha commented 4 years ago

the region I ask this is because some of one of the lines I got was this: 22,38285985;0,166400969|22,38285985 22,38286465;9,129780021|9,129780042|9,129780063|9,129780084|9,129780105|9,129780126|22,38286444|22,38286465

what does this mean exactly?

cpockrandt commented 4 years ago

Good point! I added a section to the wiki explaning the csv format.

Feel free to reopen the issue if there are any questions left!

duartemolha commented 4 years ago

Would it be possible to use the name of the sequence from the fasta file instead of it's number?

It would be much helpful if it said chr1 instead of 0 and chr2 instead of 1 , etc...

for the example I gave above I am assuming : this: 22,38285985;0,166400969|22,38285985

corresponds to the 23rd sequence in the indexed fasta file (chrX)

so chrX,38285985;chr1,166400969|chrX,38285985

correct?

cpockrandt commented 4 years ago

Unfortunately that would bloat up the csv file even more. If you still have the fasta file laying around, you can just replace it with a one-liner in awk

awk 'BEGIN{id = 0} FNR==NR{ if ($0 ~ /^>/) { gsub(/>/, "", $0); f[id++] = $0; } next; } { for (i in f) { pattern = "((^)|(;)|(|))" i ","; $0 = gensub(pattern, "\\1" f[i] ",", "g"); } print $0 }' genome.fa genome.genmap.csv > new.csv
duartemolha commented 4 years ago

Ok.. thanks... That is pretty much what I had done :)

On Fri, 20 Dec 2019, 14:14 cpockrandt, notifications@github.com wrote:

Unfortunately that would bloat up the csv file even more. If you still have the fasta file laying around, you can just replace it with a one-liner in awk

awk 'BEGIN{id = 0} FNR==NR{ if ($0 ~ /^>/) { gsub(/>/, "", $0); f[id++] = $0; } next; } { for (i in f) { pattern = "((^)|(;)|(|))" i ","; $0 = gensub(pattern, "\1" f[i] ",", "g"); } print $0 }' genome.fa genome.genmap.csv > new.csv

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cpockrandt/genmap/issues/15?email_source=notifications&email_token=AAFQIVI5NZPZC37MF3FZRKTQZTHLZA5CNFSM4J4IRLI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNA2VI#issuecomment-567938389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFQIVIDY5PQM7NHBPQDUL3QZTHLZANCNFSM4J4IRLIQ .