erikrikarddaniel / pf-gtdb-analyses

Analysis tools for Pfitmap/RNRdb/GTDB
MIT License
0 stars 1 forks source link

R script that subsets data to only RNR proteins #3

Closed erikrikarddaniel closed 4 years ago

erikrikarddaniel commented 4 years ago

psuperfamily %in% c(‘Ferritin-like’, ‘NrdGRE’, ‘Flavodoxin superfamily’, ‘NrdR-superfamily’)

Read and output both full files and those subset to representative genomes.

GhadaNOUAIRIA commented 4 years ago

My script reads gtdb feather files and subsets them to

  1. only RNR in gtdb
  2. only RNR, in representatives

another approach to obtain (2.) would be to read gtdb representatives feather files and subset only RNR.

erikrikarddaniel commented 4 years ago

Den tis 7 juli 2020 kl 20:04 skrev GhadaNOUAIRIA notifications@github.com:

My script reads gtdb feather files and subsets them to

only RNR in gtdb only RNR, in representatives

another approach to obtain (2.) would be to read gtdb representatives feather files and subset only RNR.

I think we should have one script that reads either the full tables or the representatives only, see comment in pull request.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

GhadaNOUAIRIA commented 4 years ago

Done/ Should the makefile build all subsets : i.e. subset RNRs from gtdb feather files and subset RNRs from representatives feather files? Or should makefile ask the user to choose the prefix?

erikrikarddaniel commented 4 years ago

There should be targets for both subsets, dependent on the correct set of input files, which are then called by all after download is done.

(Why did you write "Done/" first, and then a question? I saw the email but didn't think it was a show stopper because I just saw "Done/".)

GhadaNOUAIRIA commented 4 years ago

separate make targets added to makefile.