IEDB / arborist

1 stars 1 forks source link

make proteome + make protein #1

Closed dmx2 closed 8 months ago

dmx2 commented 11 months ago

make proteome for selecting all proteomes works now.

We will need to sort out the proteome.tsv file issue and end up making the output of select_proteome.py the actual target.

make protein still in the works, but getting there.

My protein tree codebase is now a submodule so any edits I am making to that codebase will be pushed there and then Arborist will have to pull the changes.

jamesaoverton commented 10 months ago

When I run make proteome I get this error:

File "/home/joverton/arborist/src/protein_tree/src/sql_engine.py", line 12, in create_sql_engine
return create_engine(f"mysql+mysqlconnector://{user}:{password}@{host}:{port}/{database}")

It fails because I don't have port set, but I don't want to connect to MySQL when running Arborist.

dmx2 commented 10 months ago

When I run make proteome I get this error:

File "/home/joverton/arborist/src/protein_tree/src/sql_engine.py", line 12, in create_sql_engine
return create_engine(f"mysql+mysqlconnector://{user}:{password}@{host}:{port}/{database}")

It fails because I don't have port set, but I don't want to connect to MySQL when running Arborist.

Yep, I can fix this no problem for make proteome. Don't you need the connection when running make iedb and hence make all if you do a full start to end build though?

jamesaoverton commented 9 months ago

I pulled the latest commits but I'm still getting the same error.

dmx2 commented 9 months ago

@jamesaoverton Sorry! This is likely due to the protein tree submodule hasn't been updated. I made a lot of commits to that codebase and it should be running now. I pushed the updated version it should work. The only problem I see is that the makeblastdb and blastp are not in bin/ when running make deps - so this might need to be fixed.

Also, make sure if you use git pull you do git pull --recurse-submodules so the submodule updates.

jamesaoverton commented 9 months ago

make proteome works for me now. make protein is not working yet. I'd like two changes, please:

  1. Don't require binaries to be in bin/. The Makefile checks that all the required binaries are on the PATH, which supports installation via system packages, otherwise it installs them to bin/. I prefer the system packages for BLAST and HMMER.
  2. Add build/arborist/manual_assignments.tsv to the repository, or add a rule to fetch it in the Makefile.
dmx2 commented 9 months ago

make protein should work now with your requested changes. Let me know how it goes.

jamesaoverton commented 9 months ago

I think that manual-parents.tsv is missing a column. I pulled the latest code, ran make protein. It fetches the Google Sheet but fails with this error:

File "/home/joverton/arborist/src/protein_tree/protein_tree/assign.py", line 425, in _assign_manuals
   manual_gene_map = manual_df.set_index('Accession')['Accession Gene'].to_dict() 
dmx2 commented 9 months ago

@jamesaoverton Yes, I ran into this problem a few days ago and I've asked Randi to add the gene symbols to the SoT sheet. I CC'd you to the email thread, but we will make a copy first and pull from there for now. I'll add the gene symbols.

dmx2 commented 9 months ago

Ok, I made a copy and I changed the URL - it should work now.... hopefully 🤞

dmx2 commented 8 months ago

Protein tree has been undone as a submodule. It works as a separate directory now. Merging and then we can continue to make changes to Makefile regarding the make proteome and make protein as needed.