Closed GhadaNOUAIRIA closed 4 years ago
The name of the script is a bit long... One could at least get rid of the _feather
part.
The looks nice and tidy, but I think there are a few simplifications possible. Let's discuss over skype.
I'm not sure why you wanted to save a table with the taxa subsetted to only representatives, and then use it to subset accessions. Did you think that the accessions table did not have genome_accno data?
I think it would be good to have one script that creates the RNR subsets. It should take a parameter that decides the prefix of files to read. Easiest is to just call the script like: subset_to_RNRs.R pfitmap-gtdb
.
Check this post for how to deal with simple arguments like the above:
http://tuxette.nathalievilla.org/?p=1696
Then use read_feather(sprintf("%s.accessions.feather", prefix))
(where prefix
is the name of the variable in which you stored the prefix) to read feather files.
This is a nice improvement. I did not use the link you sent to me (http://tuxette.nathalievilla.org/?p=1696, it seemed different fom what we want to do) Anyway, I made the wanted changes, I tried the script with the argument (from command line) and it worked on my computer. SO I pushed it and I'll add the target to Makefile
@GhadaNOUAIRIA, the semi_join()
and other comments in 9626dc8 are not addressed yet, or?
THey are now.
Done/
R script to subset GTDB data to only representatives and write them in new feather files