3dem / model-angelo

Automatic atomic model building program for cryo-EM maps
MIT License
110 stars 18 forks source link

Chain and residue numbering issues #97

Open ccgauvin94 opened 5 months ago

ccgauvin94 commented 5 months ago

Hi, If I submit a single protein sequence, I get out multiple chains despite the fact that the sequence is just a single chain. Additionally, the chain numbering doesn't make sense. The largest (first) chain, Aa starts with _atom_site.label_seq_id set to 75 which is correct, but the end of the chain is numbered at 311 despite being residue 433. Presumably there are breaks in the chain, but Model Angelo isn't incrementing the seq_id label when it encounters a break.

Additionally, the other smaller chains, despite being from the same sequence, begin their numbering at 1 (when they aren't the first residue), and wind up completely out of register with the actual protein sequence.

Is this the intended behavior? Would it be possible/is it possible to keep the sequence in register with the residue numbering?

jamaliki commented 4 months ago

Hi @ccgauvin94 ,

Is this the result of building a model with model_angelo build or model_angelo build_no_seq. In the case that you are using model_angelo build, is the output file you are looking at output.cif or output_raw.cif. This is intended for the raw output since it also includes chains that are not found in the user sequence, but is not intended behaviour for output.cif, which should just be in register with the sequence provided.

When ModelAngelo encounters a break in the chain and models it as such, it should also increment the label_seq_id for the residues that it is missing.

Could you also please tell me which ModelAngelo version you are running?

Best, Kiarash.

ccgauvin94 commented 4 months ago

Hi @jamaliki - thank you for both the wonderful software, and also taking a look at this issue.

In this case, I ran model_angelo build -v filename.mrc -pf protein_sequence.fasta -o output_directory

In the output directory, I see both the output.cif and outpw_raw.cif files. The raw file has dozens of chains, while the output.cif is the file as I described earlier.

I see this behavior with the latest standalone release (1.0.12), as well as whatever release was distributed with Relion 5.

I could potentially provide you with the map, sequence, and model files, but I'd have to just double check that with the primary investigator.

builab commented 2 months ago

I encountered the same thing when trying to model a ribosome map. Wonder is there any solutions?

jamaliki commented 1 month ago

Hi @ccgauvin94 and @builab ,

I want to apologize for the delay in my response. Could you please provide me with the map and sequence files if you are comfortable with that (email me at kjamali@mrc-lmb.cam.ac.uk)? I can then try to debug the issue.

Best, Kiarash.