drostlab / orthologr

Genome wide orthology inference and dNdS estimation
https://drostlab.github.io/orthologr/
GNU General Public License v2.0
88 stars 27 forks source link

Missing headings blast_rec #33

Open ireneortega opened 2 years ago

ireneortega commented 2 years ago

When using blast_rec, headings in blastresult{query}.fasta_ and blastresult{subject}.fasta_ are not included. I suppose they are the same as output from blast_rec. Right?

HajkD commented 2 years ago

Hi @ireneortega

I am not 100% sure what you are referring to when asking about headings in blastresult{query}.fasta vs blastresult{subject}.fasta, since both should have the headers of the respective query-id.

When you run the example:

orthologr::dNdS(query_file      = system.file('seqs/ortho_thal_cds.fasta', package = 'orthologr'),
     subject_file    = system.file('seqs/ortho_lyra_cds.fasta', package = 'orthologr'),
     delete_corrupt_cds = TRUE, # coding sequences that cannot be divided by 3 (triplets) will be removed
     ortho_detection = "RBH", # perform BLAST best reciprocal hit orthology inference
     aa_aln_type     = "pairwise", # perform pairwise global alignments of AA seqs 
     aa_aln_tool     = "NW", # using Needleman-Wunsch
     codon_aln_tool  = "pal2nal", # perform codon alignments using the tool Pal2Nal
     dnds_est.method = "Comeron", # use Comeron's method for dN/dS inference
     comp_cores      = 1 )

And then check the _blast_db folder in the tempdir() path, you will see that headers are included.

I hope this helps you to troubleshoot on your side.

Cheers, Hajk

ireneortega commented 2 years ago

Ok, but I am referring to the heading of the BLAST output files that are generated when running:

blast_rec(...,
                                             save.output = file.path)

The heading only appear if I save the output of the function blast_rec to directory.

HajkD commented 2 years ago

Could you construct a small example using the system file datasets system.file('seqs/ortho_lyra_cds.fasta', package = 'orthologr')? This will make it easier to trouble shoot on my end.

Cheers, Hajk

ireneortega commented 2 years ago

I run:

genome_ortho <- blast_rec(query_file = "C:/Users/irene/Documents/R/win-library/4.1/orthologr/seqs/ortho_lyra_cds.fasta",
                          subject_file = "C:/Users/irene/Desktop/prueba/200.fasta",
                          seq_type = "cds",
                          format = "fasta",
                          blast_algorithm = "blastn",
                          delete_corrupt_cds = FALSE,
                          save.output = file.path(working_dir)
)

readr::write_csv(genome_ortho, "C:/Users/irene/Desktop/prueba/ortho.tab", col_names = TRUE)

And the output files are: image You can see that headers are only printed in the ortho.tab file and not in the blastresult_{query}.fasta and blastresult_{subject}.fasta files.

HajkD commented 2 years ago

Ok, now I understand your question:

I suppose they are the same as output from blast_rec. Right?

Yes, the column names returned by blast_rec() correspond to the columns in blastresult_{query}.fasta.csv and blastresult_{subject}.fasta.csv.

My confusion stems from the fact that you asked about headers in the fasta file e.g. blastresult_{query}.fasta, but you were referring to the columns of the corresponding output *.csv file blastresult_{query}.fasta.csv (which you also show in the command line screenshot).

I hope this clarifies everything.

Cheers, Hajk

ireneortega commented 2 years ago

Yes, this is what I meant. Thanks. Maybe it would be a good idea if you add the name of columns in both output files, just in case other user encounters the same issue.

HajkD commented 2 years ago

Since the BLAST output format -6 is defined without header, I prefer to leave it this way for the default option, but I could add a new argument that allows you to export the column names as well. In addition, I can also document the columns which are a standard format for all BLAST hit tables.