cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
27 stars 1 forks source link

Clean up data situation #2

Closed sampsyo closed 2 years ago

sampsyo commented 2 years ago

As mentioned in https://github.com/cucapra/calyx-pangenome/pull/1#issuecomment-1098510248, it is good form to not commit "derived" or "other people's" data to our repository. Instead, this PR includes "recipes" to recreate the data. This way, people can see where the data comes from and our code becomes more reusable.

Namely, here's what I did:

  1. Removed the .og files from this repository. (We're not using them yet anyway.)
  2. Created a Makefile target to automatically fetch some GFA files for us to use. (Try make fetch.)
  3. Created another Makefile target to automatically create these to .og files, if we ever want that.
  4. Refactors the tiny Python depth script to take an input GFA file on the command line, instead of hard-coding it to use k.gfa.