Open pthieringer opened 1 year ago
Hi Patrick,
Thanks for using coinfinder! Could you please paste the first few lines of your genes_presence_absence_coinfinder.txt
file here? (e.g. head -n 4 genes_presence_absence_coinfinder.txt
). I think this is just a small formatting issue.
Sure thing! Here are the first few lines of my file:
GC_00000001 AAA007_O23 GC_00000001 AAA282_K18 GC_00000001 AAA286_D17 GC_00000001 AAA287_E17
As an update, I think I was able to figure out what I think was causing the issue with a little help from a colleague. It seems that somehow when Coinfinder takes a tab-delimited list and tries to convert it into a Roary format it adds a blank row at the top of the sorted.tmp
file.
Our solution was to hardcode a little bit into the create-rotary.py
file by adding the below to the python code. EDIT: bold was not working so I marked the changes to the side with a comment.
`with open("sorted.tmp",'r') as f:
line = f.readline()
while True:
if line=="\t\n": #From here
line=f.readline()
next. #To here is the updated code chunk
if not line:
break #EOF
#Make/empty genomeloc array
genomeloc = ["" for x in range(loc_len)]
try:
geneID = line.split("\t")[0]
except:
print("Cannot create geneID from line: " + line)
exit()
while (line.split("\t")[0] == geneID):
#Get location for corresponding genome
gen = (line.split("\t")[1]).strip()
x = gen_hash[gen]
#Append current entry to array
if (genomeloc[x] == ""):
genomeloc[x] = gen+"_"+geneID
else:
genomeloc[x] = genomeloc[x]+" "+gen+"_"+geneID
line = f.readline()
bulk = ("".join([',"{}"'.format(genomeloc[n]) for n in range(len(genomeloc))]))
roary.write("\""+geneID+"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\""+bulk+"\n")`
I don't know if there's a simpler or more effective way of doing this, but I was able to get everything to run after this fix! Let me know if you need any additional information or code.
From the output of the top of your file, it looks like there are >1 gene/genome pair per line? or is that just github's formatting? Coinfinder would expect the file to look like: GC_00000001 AAA007_O23 GC_00000001 AAA282_K18 GC_00000001 AAA286_D17 GC_00000001 AAA287_E17
Where GC_00000001
is one of your genes and AAA007_O23
is one of your genomes.
But considering I got github to struggle to show what I wanted above, I imagine it's just a github formatting thing!
In which case... I can't see where a purposefully blank line would be written to the top of sorted.tmp in create_roary.py
. I doubt there is much harm in just removing the blank line, but it might be worth us tracing it back to be sure that there isn'a bug that is, for e.g., removing a gene and leaving the blank line in it's place. I wonder if there might be a blank line at the bottom of your input file? My code is only smart enough to check that there are 2 columns of information in the first line of the input file; when it sorts on lines 31-5 if there was a blank line, perhaps an empty gene ID would be sorted to the top of sorted.tmp?
This little test suggests that's a possibility anyway! Let me know if this ends up being the issue and I'll improve the code to detect and remove blank lines.
Hi Fiona!
Thanks so much for the thorough reply! First, yes I think my copy and paste of the genes presence/absence file seems to have been formatted weirdly through Github. It is as you are expecting it to look with the genes in the first column and the genome/MAG names in the second column.
I just did a quick test using tail
to see if there was a blank line at the end of the file....and there it was :)
So it does look like a blank line will be placed at the top and then cause the code to not run properly, but that was because of user error! Thanks for nailing down this issue, hopefully this will be an easy fix for others in the future if they run into this.
Thanks again for all the help and feedback!
Hi Patrick,
So great to hear this was an easy fix! I'm going to leave this issue open until I have a chance to improve the code so that it will spit this up as an easier-to-navigate error in the future.
Happy co-occurrencing!
Hello!
I have recently installed Coinfinder and have been running into issues with getting it to run. Below is the code I am using to run the program.
coinfinder -i genes_presence_absence_coinfinder.txt -p Marine_AOA_iqtree.treefile -o /COINFINDER_OUTPUT/Marine_AOA_associate -a -m
However, I run into the following error related to my gene presence/absence file. It's hard for me to tell what might be formatted incorrectly with the file I am providing? I have a tab delimited list of genes and the MAGs they are present in. Though I am not sure what might be causing the error.
`Reading arguments...
Thank you for your time and advice!