Open halexand opened 4 years ago
Hi, sorry for the delay, I was on vacation (does Github have a notification system for that?) Glad you want to test EukCC. Which version are you running?
eukcc --version
To address the problem: We noticed that you can also omit the bed file in most cases, thus I would recommend just for now removing the --bed
flag, EukCC should then run fine.
To explain: EukCC uses bed files indicating the start and the stop of the first and the last exon of the protein to remove close hits of the same panther profile. So each protein should have only one entry, which is different to what you supplied. Its a format we agreed on using, but its not perfect and will likely be replaced in the update of the database.
If you would be willing to share your protein files and the bed file, I would be very interested to fix this bug, as it would be best for EukCC to work as expected. Feel free to email me (saary@ebi.ac.uk)
Hi @openpaul,
No worries and thanks for the reply! I ended up dropping the --bed
flag and running it with just the protein file. It runs so quickly that it isn't a big deal to rerun with --bed
. I am glad to hear that the problem that the addition of a bed file solves isn't a huge issue!
I am running EukCC version 0.1.5.1. I can email you some files as I think they are too large to attach.
Glad to hear it worked out for you. Yes feel free to email me, so I can possibly address the core issue.
Hello,
I am curious to try out eukcc with some eukaryotic MAGs. I have already predicted proteins and am trying to run eukcc with predicted proteins and associate bed files. I am getting an error when I try to include the bed file, however, that I think must be due to the format of my bed file.
The command I am using is:
eukcc --db /vortexfs1/omics/alexander/data/databases/eukccdb -o NAO-all-SRF-20-180-00_bin-42 --protein NAO-all-SRF-20-180-00_bin-42.all.maker.proteins.fasta --bed NAO-all-SRF-20-180-00_bin-42.bed --ncores 8
Notably, when I grep the missing bed entry in the bed file-- it comes up:
I created the bed file by passing a gff3 file created by maker2 with
gff2bed < in.gff > out.bed
. Is there a preferred method for creating a bed file? Or am I missing something else? Thank you!