erikrikarddaniel / pf-gtdb-analyses

Analysis tools for Pfitmap/RNRdb/GTDB
MIT License
0 stars 1 forks source link

Add files via upload #2

Closed GhadaNOUAIRIA closed 4 years ago

GhadaNOUAIRIA commented 4 years ago

R script to subset GTDB data to only representatives and write them in new feather files

erikrikarddaniel commented 4 years ago

The name of the script is a bit long... One could at least get rid of the _feather part.

The looks nice and tidy, but I think there are a few simplifications possible. Let's discuss over skype.

GhadaNOUAIRIA commented 4 years ago

I'm not sure why you wanted to save a table with the taxa subsetted to only representatives, and then use it to subset accessions. Did you think that the accessions table did not have genome_accno data?

erikrikarddaniel commented 4 years ago

I think it would be good to have one script that creates the RNR subsets. It should take a parameter that decides the prefix of files to read. Easiest is to just call the script like: subset_to_RNRs.R pfitmap-gtdb.

Check this post for how to deal with simple arguments like the above:

http://tuxette.nathalievilla.org/?p=1696

Then use read_feather(sprintf("%s.accessions.feather", prefix)) (where prefix is the name of the variable in which you stored the prefix) to read feather files.

GhadaNOUAIRIA commented 4 years ago

This is a nice improvement. I did not use the link you sent to me (http://tuxette.nathalievilla.org/?p=1696, it seemed different fom what we want to do) Anyway, I made the wanted changes, I tried the script with the argument (from command line) and it worked on my computer. SO I pushed it and I'll add the target to Makefile

erikrikarddaniel commented 4 years ago

@GhadaNOUAIRIA, the semi_join() and other comments in 9626dc8 are not addressed yet, or?

GhadaNOUAIRIA commented 4 years ago

THey are now.

GhadaNOUAIRIA commented 4 years ago

Done/