Closed kellystyles closed 3 years ago
Can you please check if you have records in ${ds1}/build/CombinedHmmSearch.txt? Also, you can take a look at ${ds1}/build/CombinedBLASTSearch.txt and make sure there are some hits in there.
Sorry, I accidentally closed the issue (new to github). Both of those files are present, but are empty with the exception of the CombinedBLASTSearch.txt file which has only the column names.
Could you please also check you have a bunch of .hmm files in the folder spHMMs? If so, i think the hmmer search is failing for some reason. Which tool did you use to generate the protein sequence read files from the nucleotide sequence files?
Yes, there are .fas files with the segmented protein alignment, and some .hmm files, although some of these are empty.
I agree that it must be something with hmmer search. Maybe that's where the "Illegal instruction (core dumped)" errors are coming from.
For your second question, are you referring to the synthetic reads? If so, I have not translated these as the readme says they are computed if not provided.
I think I understand the immediate problem. Please remove the parameter --prot_seq_directory
from your command line and resubmit.
We check if this parameter is pointing to a valid directory. If it is, we assume that it contains the protein files already converted. In this case it finds an empty directory and does not do any HMMER or BLAST searching.
If this parameter is not provided it converts the nt sequences into aa sequences and the files are saved in a folder in the build output directory. It takes longer to do the conversion however. So, for bigger runs it might be better to do the conversion in an array job outside.
I will improve the documentation of the parameter with better verbiage.
I tried running metabgc-build without the --prot_seq_directory
parameter and it still stopped prematurely without translating the synthetic metagenomes.
I then tried working with a subset of my data (2 synthetic metagenomes only) and translated these synthetic reads myself (using transeq to translate into all 6 frames, and appending 'trans' to the end of the resulting file names) and then running metabgc-build. However I'm not sure if the program recognises the files or not as there is no difference in the output, but the job will run until I stop it (the longest I let it run was 4 days). Is there a naming convention required for these translated synthetic metagenomes?
Please name the protein sequence files exactly the same as the corresponding nucleotide sequence files. Depending on the size of your input files and the number of spHMMs it will take a while to run.
You should see the hmmer search output files in the ${ds1}/build/hmm_result
folder being generated. Each of your samples will be hmmer searched against each spHMM. How many spHMMs do you have? Please count the number of .hmm files in ${ds1}/build/spHMMs
folder.
Please provide as many cores as you can using the --cpu option for faster searching.
Renaming the protein sequence files to the same name as the nucleotide sequences allowed metabgc-build to continue a bit further. However, now it is failing and returning the error output below:
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Parsed with column specification:
cols(
X1 = col_character(),
X2 = col_character(),
X3 = col_character(),
X4 = col_character(),
X5 = col_double(),
X6 = col_character(),
X7 = col_character()
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Parsed with column specification:
cols(
sseqid = col_character(),
slen = col_character(),
sstart = col_character(),
send = col_character(),
qseqid = col_character(),
qlen = col_character(),
qstart = col_character(),
qend = col_character(),
pident = col_character(),
evalue = col_character(),
Sample = col_character(),
sampleType = col_character()
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Parsed with column specification:
cols(
gene_name = col_character(),
start = col_double(),
end = col_double(),
interval = col_character(),
prot_type = col_character()
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Error: Can't subset columns that don't exist.
✖ Column `model_cov` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: In addition:
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: There were 12 warnings (use warnings() to see them)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]:
It seems to suggest an issue with the EvaluateSpHMMs.R script, with model_cov
. I'm not familiar with R though, so it could just be that there's no output written for that column perhaps? How crucial is it to have R v3.6.1? I am only able to install R v3.6.3
My protein alignment is split into 23 .hmm files. All the HMM search results seem to be in 'CombinedHmmSearch.txt'. However, there is still no BLAST output in the 'blastn_result' directory.
We have released 2.0.0 version the tool. All R dependencies have been removed making build easier to run. A toy build example can be downloaded and tested here: https://github.com/donia-lab/MetaBGC#quick-start
Hello, I have been trying to find spHMMs for my protein family of interest using metabgc build, but I haven't been able to successfully complete a metabgc build run. I have prepared synthetic metagenomes as in the supplementary of the metabgc paper (although I only prepared 10 each for high and low synthetic metagenomes for testing).
I have been running the software from a Singularity container with all the prerequisites installed as required in the documentation. The metabgc search function using this container works with the toy data, so I think it is an issue with the build function. Furthermore I have tried running metabgc build from metabgc installed on my university HPC to no avail. Again, the metabgc search function works with the toy data with this locally installed version of metabgc.
I'm not sure if it's just an issue with my inputs or something more, but I would appreciate any help!
A typical run will finish by saying that MetaBGC Build failed in the output. The script I used and the error and output is below:
run script
error output
output