drostlab / orthologr

Genome wide orthology inference and dNdS estimation
https://drostlab.github.io/orthologr/
GNU General Public License v2.0
88 stars 27 forks source link

Control output value of orthologs function #24

Open janstrauss1 opened 4 years ago

janstrauss1 commented 4 years ago

Hi @HajkD ,

I'm using the orthologs function with ortho_detection = "RBH" to detect orthologs for a query_file containing 8 protein fasta sequences in multiple subject_files.

For easier downstream data parsing, I would be very interested to set the orthologs function in a way to output results for any query_id including those queries that did not give any hits (i.e. fill result table with NA).

I would appreciate any help how to achieve this.

Many thanks in advance!

Jan

HajkD commented 4 years ago

Hi Jan,

Many thanks for contacting me for this.

Would it be possible to create a small example with 3 query sequences and 2 times 5 subject_sequences, so that I can be sure that I understood your request correctly?

So you would like to retain query_ids in a data.frame even if they didn't produce subject hits (encoded by NA lines)?

If yes, I assume simply doing a dplyr::full_join() by query_ids between the initial input query_ids (stored as data.frame) and the result table generated by orthologr::orthologs() is not sufficient enough? If not, could you maybe specify what you had in mind?

I hope this helps and goes in the direction you had in mind?

Cheers, Hajk