bioinfologics / sdg

Sequence Distance Graph framework: graph + reads + mapping + analysis
MIT License
25 stars 6 forks source link

Linked reads DS data format #105

Closed gonzalogacc closed 5 years ago

gonzalogacc commented 5 years ago

10xseq format in cli created a corrupted DS

Using CLI

sdg-datastore make -t 10xseq -o lirds.lirds -1 ./child/child-link-reads_R1.fastq -2 ./child/child-link-reads_R2.fastq  -n lids1
sdg-datastore

Git origin: https://github.com/bioinfologics/sdg.git -> master
Git commit: de6ab20

Executed command:
/Users/ggarcia/Documents/git_sources/sdg/build/sdg-datastore make -t 10xseq -o lirds.lirds -1 ./child/child-link-reads_R1.fastq -2 ./child/child-link-reads_R2.fastq -n lids1

2019-07-17 13:52:07: Opening: ./child/child-link-reads_R1.fastq
2019-07-17 13:52:07: Detected max read size 250
2019-07-17 13:52:07: Creating Datastore Index from ./child/child-link-reads_R1.fastq | ./child/child-link-reads_R2.fastq
2019-07-17 13:52:07: Building tag sorted chunks of 1000000 pairs
2019-07-17 13:52:08: 1000000 pairs dumping on chunk 0
2019-07-17 13:52:08: 1000000 pairs dumping on chunk 0
2019-07-17 13:52:11: dumped!
2019-07-17 13:52:11: 190776 pairs dumping on chunk 1
2019-07-17 13:52:11: 190776 pairs dumping on chunk 1
2019-07-17 13:52:12: dumped!
2019-07-17 13:52:12: performing merge from disk
2019-07-17 13:52:12: leaving space for 1190776 read_tag entries
2019-07-17 13:52:16: chunk 0 finished
2019-07-17 13:52:16: chunk 1 finished
2019-07-17 13:52:16: writing down 1190776 read_tag entries
2019-07-17 13:52:16: Datastore with 2381552 reads, 595382 reads with tags

In python script

import pysdg as SDG
ws = SDG.WorkSpace()
ws.sdg.load_from_gfa('./initial_graph.gfa')
ws.add_linked_reads_datastore("./lirds.lirds.lrseq", "li1")
lids = ws.linked_reads_datastores[0]
print(lids.get_read_sequence(1))
print(lids.get_read_tag(1))

produces

'``````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````'
'0'

Should produce a sequence and a tag

gonzalogacc commented 5 years ago

Seq format deleted, replaced by raw format.