RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq
GNU General Public License v3.0
113 stars 49 forks source link

hla.p.json #92

Closed drkoryjohns closed 1 year ago

drkoryjohns commented 2 years ago

Hello,

I am trying to run arcasHLA using the test commands provided on the git page but incur error:

traceback (most recent call last): File "/arcasHLA/scripts/genotype.py", line 707, in with open(hla_json, 'r') as file: FileNotFoundError: [Errno 2] No such file or directory: '/arcasHLA/scripts/../dat/ref/hla.p.json'

When I inspect the reference.py script, it appears there is this line that is commented out:

https://github.com/RabadanLab/arcasHLA/blob/master/scripts/reference.py#L458

Is that the issue, should this line be uncommented or how can I circumvent the error?

Thank you,

Kory Johnson

YuJiandongBio commented 1 year ago

Same problem, any solution?

olekskrav commented 1 year ago

Hi Kory. Thanks for the interest in our tool, and sorry for the delay in replying!

Unfortunately, we were not able to reproduce the error. Can you let us know which version of arcasHLA you have used, and what where specific commands leading to the error? We recommend using the latest version built with our official docker file as described here.

Also, just to provide some insight, the file hla.p.json should be generated during the reference step which is implicitly performed by the check_ref() function call in genotype.py (currently at line 700).

Another way to generate the missing file is to manually run the reference command first as follows:

arcasHLA reference -v

The reference command clones the IMGT/HLA database and performs the necessary indexing needed later for the genotyping. In particular, it should create the files hla.fasta, hla.idx and hla.p.json in the dat/ref folder. (The -v option allows to observe the progress of the command.)

Please let us know if you observe any issue with running the reference step and if all these files are successfully generated.

As for the commented line you mentioned, it should not be related to the issue. It was due to the update in which we switched from the pickle to json format in the intermediary files.

drkoryjohns commented 1 year ago

Thanks for getting back to me! The team that supports the HPC I use, NIH Biowulf, installed another instance and it worked no issue! Best, Kory

Kory R. Johnson, MS, PhD Bioinformatics Core Director, Information Technology Program (ITP), Division of Intramural Research (DIR), National Institute of Neurological Disorders & Stroke (NINDS), National Institutes of Health (NIH), Bethesda, Maryland

[The Human Connectome Project uses state-of-the-art neuroimaging technology to explore the connections within the human brain.]

Mailing Address:

NINDS/NIH Building 13, Office G336 9000 Rockville Pike Bethesda, MD 20892

Contact Information:

Phone: 301-402-1956 email: @.**@.>

P Green Message:

Please consider the environment before printing this e-mail. Thank you.

Important Message:

This electronic message transmission contains information intended for the recipient only. Such that, the information contained herein may be confidential, privaledged, or proprietary. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of this information is strictly prohibited. If you have received this electronic information in error, please notify the sender immediately by telephone. Thank you.

From: Oleksandr Kravets @.> Sent: Thursday, October 13, 2022 5:15 PM To: RabadanLab/arcasHLA @.> Cc: Johnson, Kory (NIH/NINDS) [E] @.>; Author @.> Subject: [EXTERNAL] Re: [RabadanLab/arcasHLA] hla.p.json (Issue #92)

Hi Kory. Thanks for the interest in our tool, and sorry for the delay in replying!

Unfortunately, we were not able to reproduce the error. Can you let us know which specific commands did you run and if you used the latest version of arcasHLA?

Just to provide some insight, the file hla.p.json should be generated during the reference step which is implicitly performed by the check_ref() function call in genotype.py (currently at line 700https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRabadanLab%2FarcasHLA%2Fblob%2Fmaster%2Fscripts%2Fgenotype.py%23L700&data=05%7C01%7Cjohnsonko%40ninds.nih.gov%7C9d5605e6b4cb454901c108daad5ff4bd%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638012924881241480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=it0Vhlefet%2FE3eayJt7FG2MToxuJHm5dnTBshHDK4Is%3D&reserved=0).

Another way to generate the missing file is to manually run the reference command first as follows:

arcasHLA reference -v

The reference command clones the IMGT/HLA database and performs the necessary indexing needed later for the genotyping. In particular, it should create the files hla.fasta, hla.idx and hla.p.json in the dat/ref folder. (The -v option allows to observe the progress of the command.)

Please let us know if you observe any issue with running the reference step and if all these files are successfully generated.

Also, the commented line you mentioned should not be related to the issue. It was due to the update in which we switched from the pickle to json format in the intermediary files.

- Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRabadanLab%2FarcasHLA%2Fissues%2F92%23issuecomment-1278185981&data=05%7C01%7Cjohnsonko%40ninds.nih.gov%7C9d5605e6b4cb454901c108daad5ff4bd%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638012924881241480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1I1l7AnQrxGAk2rVkB5MSubTdgZLiUy8E%2FMNY5FMzyA%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACAPHWROTGDWMEN7AZMOHULWDB3TVANCNFSM55L5AQPA&data=05%7C01%7Cjohnsonko%40ninds.nih.gov%7C9d5605e6b4cb454901c108daad5ff4bd%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638012924881241480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RtqedNtJOP87iw47lWuesGvS1pV0hf1IM6KZk1A3cYQ%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

olekskrav commented 1 year ago

Great, thanks for letting us know!