cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

slow-odgi: tidying #36

Closed anshumanmohan closed 1 year ago

anshumanmohan commented 1 year ago

In addition to #33, some other high-level things that should be done after merging https://github.com/cucapra/pollen/pull/29 and https://github.com/cucapra/pollen/pull/31:

sampsyo commented 1 year ago

Test against more of the graphs that are already in the odgi repo. I presume the present selection that we fetch with make fetch have been chosen because they are of a reasonable size?

FWIW, the files checked into the odgi repo are all pretty small. As of this writing, the ones we fetch are: https://github.com/cucapra/pollen/blob/af62febce2a4ae1afe427ecf0ce89cde2c55e416/Makefile#L1

Measuring those sizes, we have:

$ for name in t k note5 overlap q.chop DRB1-3123 LPA ; do ls -lh $name.gfa ; done
-rw-r--r--@ 1 asampson  staff   491B Nov 24  2021 t.gfa
-rw-r--r--@ 1 asampson  staff   561B Nov 24  2021 k.gfa
-rw-r--r--@ 1 asampson  staff   168B Nov 24  2021 note5.gfa
-rw-r--r--@ 1 asampson  staff   362B Nov 24  2021 overlap.gfa
-rw-r--r--@ 1 asampson  staff   1.4K Nov 24  2021 q.chop.gfa
-rw-r--r--@ 1 asampson  staff   452K Nov 24  2021 DRB1-3123.gfa
-rw-r--r--@ 1 asampson  staff   1.5M Nov 24  2021 LPA.gfa

So, they range from trivial to tiny. The exception, of course, is chr8, which is properly big (and not in the repo for that reason):

$ ls -lh chr8.pan.gfa
-rw-r--r--  1 asampson  staff   3.9G Feb  9 07:48 chr8.pan.gfa