RomoL2 / RegVar

Characterize 3'UTR variants
8 stars 2 forks source link

Trouble using CharacterizeVariants functions #7

Closed pamelaxu213 closed 7 months ago

pamelaxu213 commented 7 months ago

Hi,

I installed RegVar in a conda environment and also installed RBPamp successfully. I am getting this error message:

> CharacterizeVariants_single_input("../test/")
Enter 3'UTR single nucleotide variant chromosome number: 1
Enter variant position in 1-based coordinates: 45015501
Enter hg38 reference base (capitalized) at variant position: G
Enter variant base (capitalized): A
Enter a name for this variant (no spaces): test
[1] "incorporating eCLIP"
[1] "incorporating eQTLS"
[1] "incorporating GWAS"
[1] "incorporating microRNAs"
[1] "incorporating RBP motifs"
Error: Can't assign 6 names to a 7-column data.table
In addition: Warning message:
In rm(miR_predictions) : object 'miR_predictions' not found

I am wondering what might be wrong here.

Greatly appreciate your help!

Thank you!

RomoL2 commented 7 months ago

I think I know what's wrong. Would you be able to send me a mock file (like the first 10 lines or something) and I can fix it?

Lindsay

On Sat, Feb 3, 2024 at 3:01 PM pamelaxu213 @.***> wrote:

Hi,

I installed RegVar in a conda environment and also installed RBPamp successfully. I am getting this error message:

CharacterizeVariants_single_input("../test/")Enter 3'UTR single nucleotide variant chromosome number: 1Enter variant position in 1-based coordinates: 45015501Enter hg38 reference base (capitalized) at variant position: GEnter variant base (capitalized): AEnter a name for this variant (no spaces): test[1] "incorporating eCLIP"[1] "incorporating eQTLS"[1] "incorporating GWAS"[1] "incorporating microRNAs"[1] "incorporating RBP motifs"Error: Can't assign 6 names to a 7-column data.tableIn addition: Warning message:In rm(miR_predictions) : object 'miR_predictions' not found

I am wondering what might be wrong here.

Greatly appreciate your help!

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDEBMORF2OLMYUZTG3TYR2JP3AVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTMNRWGAYDCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

RomoL2 commented 7 months ago

Nevermind haha, I see it's with the single input function. Let me take a look and fix it. I'll let you know once it's fixed.

On Sat, Feb 3, 2024 at 3:28 PM Lindsay Romo @.***> wrote:

I think I know what's wrong. Would you be able to send me a mock file (like the first 10 lines or something) and I can fix it?

Lindsay

On Sat, Feb 3, 2024 at 3:01 PM pamelaxu213 @.***> wrote:

Hi,

I installed RegVar in a conda environment and also installed RBPamp successfully. I am getting this error message:

CharacterizeVariants_single_input("../test/")Enter 3'UTR single nucleotide variant chromosome number: 1Enter variant position in 1-based coordinates: 45015501Enter hg38 reference base (capitalized) at variant position: GEnter variant base (capitalized): AEnter a name for this variant (no spaces): test[1] "incorporating eCLIP"[1] "incorporating eQTLS"[1] "incorporating GWAS"[1] "incorporating microRNAs"[1] "incorporating RBP motifs"Error: Can't assign 6 names to a 7-column data.tableIn addition: Warning message:In rm(miR_predictions) : object 'miR_predictions' not found

I am wondering what might be wrong here.

Greatly appreciate your help!

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDEBMORF2OLMYUZTG3TYR2JP3AVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTMNRWGAYDCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

RomoL2 commented 7 months ago

It is working for me with your variant. It looks like you may not have all the extdata files loaded for some reason (looking at that error from rm miR_predictions file- that's an intermediate that should exist but might not if one of the files is missing). Would you be able to confirm that you have the extdata folder and all the files that are supposed to be in it?

This is what you should see as output (see below).

CharacterizeVariants_single_input('~') Enter 3'UTR single nucleotide variant chromosome number: 1 Enter variant position in 1-based coordinates: 45015501 Enter hg38 reference base (capitalized) at variant position: G Enter variant base (capitalized): A Enter a name for this variant (no spaces): test1 gunzip: clinvar_for_script.bed already exists -- skipping [1] "incorporating eCLIP" [1] "incorporating eQTLS" [1] "incorporating GWAS" [1] "incorporating microRNAs" [1] "incorporating RBP motifs" processing HNRNPA0.txt processing SF1.txt processing MBNL1.txt processing HNRNPA1.txt processing DAZ3.txt processing RALY.txt processing ESRP1.txt processing CELF1.txt processing TARDBP.txt processing RBM15B.txt processing MSI1.txt processing SRSF5.txt processing A1CF.txt processing PTBP3.txt processing SRSF4.txt processing PRR3.txt processing RC3H1.txt processing TRNAU1AP.txt processing RBMS3.txt processing HNRNPK.txt processing RBMS2.txt processing UNK.txt processing ZCRB1.txt processing TIA1.txt processing RBFOX3.txt processing RBFOX2.txt processing PUM1.txt processing SRSF2.txt processing HNRNPDL.txt processing KHDRBS2.txt processing HNRNPL.txt processing PUF60.txt processing PUM2.txt processing KHDRBS3.txt processing FUBP3.txt processing CPEB1.txt processing NUPL2.txt processing BOLL.txt processing SFPQ.txt processing ILF2.txt processing RBM6.txt processing RBM4.txt processing RBM24.txt processing SNRPA.txt processing HNRNPC.txt processing RBM25.txt processing FUBP1.txt processing NOVA1.txt processing HNRNPF.txt processing FUS.txt processing KHSRP.txt processing HNRNPH2.txt processing SRSF9.txt processing TRA2A.txt processing PABPN1L.txt processing RBM22.txt processing ELAVL4.txt processing HNRNPD.txt processing RBM23.txt processing DAZAP1.txt processing SRSF8.txt processing PCBP4.txt processing CNOT4.txt processing RBM4B.txt processing ZNF326.txt processing RBM45.txt processing RBM47.txt processing ZFP36.txt processing PCBP2.txt processing IGF2BP2.txt processing HNRNPA2B1.txt processing SRSF11.txt processing TAF15.txt processing EWSR1.txt processing HNRNPCL1.txt processing SRSF10.txt processing PCBP1.txt processing IGF2BP1.txt processing EIF4G2.txt processing RBM41.txt [1] "fetching CADD scores for gnomAD variants" -------------------------------------------------- ================================================== [1] "incorporating ClinVar" [1] "incorporating APA info" [1] "incorporating conservation scores"
==================================================
[1] "predicting GWAS or eQTLs" -------------------------------------------------- ==================================================
==================================================

[1] "formatting output" [1] "compressing files; this may take a bit" gzip: clinvar_for_script.bed.gz already exists -- skipping

On Sat, Feb 3, 2024 at 3:40 PM Lindsay Romo @.***> wrote:

Nevermind haha, I see it's with the single input function. Let me take a look and fix it. I'll let you know once it's fixed.

On Sat, Feb 3, 2024 at 3:28 PM Lindsay Romo @.***> wrote:

I think I know what's wrong. Would you be able to send me a mock file (like the first 10 lines or something) and I can fix it?

Lindsay

On Sat, Feb 3, 2024 at 3:01 PM pamelaxu213 @.***> wrote:

Hi,

I installed RegVar in a conda environment and also installed RBPamp successfully. I am getting this error message:

CharacterizeVariants_single_input("../test/")Enter 3'UTR single nucleotide variant chromosome number: 1Enter variant position in 1-based coordinates: 45015501Enter hg38 reference base (capitalized) at variant position: GEnter variant base (capitalized): AEnter a name for this variant (no spaces): test[1] "incorporating eCLIP"[1] "incorporating eQTLS"[1] "incorporating GWAS"[1] "incorporating microRNAs"[1] "incorporating RBP motifs"Error: Can't assign 6 names to a 7-column data.tableIn addition: Warning message:In rm(miR_predictions) : object 'miR_predictions' not found

I am wondering what might be wrong here.

Greatly appreciate your help!

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDEBMORF2OLMYUZTG3TYR2JP3AVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTMNRWGAYDCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

RomoL2 commented 7 months ago

This is what you should see in the folder (or unzipped versions of all of these files, depending on where your code ended):

(base) @.*** extdata % ls

GTEx_v8_finemapping_DAPG_UTR_for_script.bed.gz

GTEx_v8_finemapping_DAPG_scottUTR_processed.txt.gz

GTEx_v8_finemapping_DAPG_scottUTR_processed_expanded_miR_info.txt.gz

GWAS_UTR_for_script.bed.gz

LR_all_APA_peak_coords_hg38_by_base.phylop_100.phylop_17.phastcons_100.phastcons_17.txt.gz

RBPamp

RBPamp_aff_local.py

all_APA_peak_coords_hg38.bed.gz

all_gd_UTR_vars.bed.gz

clinvar_for_script.bed

clinvar_for_script.bed.gz

eCLIP_GRCh38_reps_1_2_narrow_peak_ALL_RBP_cell_line_pairs.RBPs_with_amp_NEW_motifs.lines_assayed.lines_w_peak.NEW_aff_sum_peak_pm_10.in_top_motif.BY_BASE_BY_RBP.bed.gz

eCLIP_peaks.GRCh38.IDR_pass.main_chr.HepG2.full_name.bed.gz

eCLIP_peaks.GRCh38.IDR_pass.main_chr.K562.full_name.bed.gz

gwas_scottUTRvars_processed.txt.gz

hg38.fa

hg38.fa.fai

hg38.fa.gz

hg38_miR_predictions_final.bed.gz

hg38_miR_predictions_final.txt.gz

motifs2

requirements.txt.gz

scott_vcf.bed.3pseq_intersected.txt.gz

On Sat, Feb 3, 2024 at 3:01 PM pamelaxu213 @.***> wrote:

Hi,

I installed RegVar in a conda environment and also installed RBPamp successfully. I am getting this error message:

CharacterizeVariants_single_input("../test/")Enter 3'UTR single nucleotide variant chromosome number: 1Enter variant position in 1-based coordinates: 45015501Enter hg38 reference base (capitalized) at variant position: GEnter variant base (capitalized): AEnter a name for this variant (no spaces): test[1] "incorporating eCLIP"[1] "incorporating eQTLS"[1] "incorporating GWAS"[1] "incorporating microRNAs"[1] "incorporating RBP motifs"Error: Can't assign 6 names to a 7-column data.tableIn addition: Warning message:In rm(miR_predictions) : object 'miR_predictions' not found

I am wondering what might be wrong here.

Greatly appreciate your help!

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDEBMORF2OLMYUZTG3TYR2JP3AVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTMNRWGAYDCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

pamelaxu213 commented 7 months ago

Thanks for checking! :)

I checked and the only file missing in the extdata folder is requirements.txt.gz. The two hg38_miR_predictions_final files are there. I also have some temporary intermediate files which should be fine.

I don't think the function uses requirements.txt.gz though...

Wondering if you have any insights to the message 'Error: Can't assign 6 names to a 7-column data.table'?

RomoL2 commented 7 months ago

You're right- you don't need requirements.txt (it's a file for setting up RBPamp, but if you were missing that you should have gotten an error when you were building the conda environment for RBPamp). One of your intermediates (vcf_UTR entering the RBPamp step) has an extra column for some reason. I'm not sure why, since it's not happening on my computer. I'm guessing the culprit is this line of code: intersection <- data.table::fread("merged_by_strand_nat_sorted_vcf_UTR_tmp.bed", header = FALSE) names(intersection) <- names_BED_std

names_BED_std is 6 columns (names_BED_std <- c("chrom", "chromStart", "chromEnd", "name", "score", "strand")), but for some reason your intermediate has 7 columns.

I know this is a bit of a pain but instead of running the function, would you be able to run the attached R code (for your variant up to the RBPamp step) and then show me what vcf_UTR looks like once you get the error? Then we will be able to figure out what is wrong with the intermediate and fix it.

Lindsay

On Sat, Feb 3, 2024 at 6:06 PM pamelaxu213 @.***> wrote:

Thanks for checking! :)

I checked and the only file missing in the extdata folder is requirements.txt.gz. The two hg38_miR_predictions_final files are there. I also have some temporary intermediate files which should be fine.

I don't think the function uses requirements.txt.gz though...

Wondering if you have any insights to the message 'Error: Can't assign 6 names to a 7-column data.table'?

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1925479953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDBY22FBL7P3WFJPCJLYR27HRAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGQ3TSOJVGM . You are receiving this because you commented.Message ID: @.***>

RomoL2 commented 7 months ago

(this is was what the vcf_UTR should look like after running all of this code, assuming it didn't get an error during it):

vcf_UTR var_id motif_RBPs motif_cat chrom chromStart chromEnd 1: chr1_4501550045015501+_G_A HNRNPK_TARDBP_HNRNPA2B1 lostlostpreserved chr1 45015500 45015501 info isoStart isoStop gene strand number_isos iso_loc UTRstart UTRstop 1: GAtest 45015499 45015568 ENSG00000126088 + 1 1 45015499 45015575 eclip_tot ref alt tmp_key eqtl_info gwas_info miR_info 1: RPS3_K562__SUB1_HepG2 G A 1 NA NA NA

On Sat, Feb 3, 2024 at 6:16 PM Lindsay Romo @.***> wrote:

You're right- you don't need requirements.txt (it's a file for setting up RBPamp, but if you were missing that you should have gotten an error when you were building the conda environment for RBPamp). One of your intermediates (vcf_UTR entering the RBPamp step) has an extra column for some reason. I'm not sure why, since it's not happening on my computer. I'm guessing the culprit is this line of code: intersection <- data.table::fread("merged_by_strand_nat_sorted_vcf_UTR_tmp.bed", header = FALSE) names(intersection) <- names_BED_std

names_BED_std is 6 columns (names_BED_std <- c("chrom", "chromStart", "chromEnd", "name", "score", "strand")), but for some reason your intermediate has 7 columns.

I know this is a bit of a pain but instead of running the function, would you be able to run the attached R code (for your variant up to the RBPamp step) and then show me what vcf_UTR looks like once you get the error? Then we will be able to figure out what is wrong with the intermediate and fix it.

Lindsay

On Sat, Feb 3, 2024 at 6:06 PM pamelaxu213 @.***> wrote:

Thanks for checking! :)

I checked and the only file missing in the extdata folder is requirements.txt.gz. The two hg38_miR_predictions_final files are there. I also have some temporary intermediate files which should be fine.

I don't think the function uses requirements.txt.gz though...

Wondering if you have any insights to the message 'Error: Can't assign 6 names to a 7-column data.table'?

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1925479953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDBY22FBL7P3WFJPCJLYR27HRAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGQ3TSOJVGM . You are receiving this because you commented.Message ID: @.***>

pamelaxu213 commented 7 months ago

Here are the outputs:

vcf_UTR:

chrom chromStart chromEnd info isoStart isoStop gene strand number_isos iso_loc UTRstart UTRstop eclip_tot ref alt tmp_key eqtl_info gwas_info miR_info RBPamp_info

chr1 | 45015500 | 45015501 | GAtest | 45015499 | 45015568 | ENSG00000126088 | + | 1 | 1 | 45015499 | 45015575 | RPS3_K562__SUB1_HepG2 | G | A | 1 | NA | NA | NA | G_A

intersection

V1 V2 V3 V4 V5 V6 V7

chr1 | 45015500 | 45015501 | + | G_A | ENSG00000126088 | +

names_BED_std 'chrom''chromStart''chromEnd''name''score''strand'

RomoL2 commented 7 months ago

So strange, bedtools seems to be adding an extra column. My intersection looks like this. What bedtools version are you using? Maybe they updated something that is adding an extra column. I am using

bedtools v2.30.0

intersection V1 V2 V3 V4 V5 V6 1: chr1 45015500 45015501 G_A ENSG00000126088 +

On Sat, Feb 3, 2024 at 7:19 PM pamelaxu213 @.***> wrote:

Here are the outputs:

vcf_UTR: chrom chromStart chromEnd info isoStart isoStop gene strand number_isos iso_loc UTRstart UTRstop eclip_tot ref alt tmp_key eqtl_info gwas_info miR_info RBPamp_info

chr1 | 45015500 | 45015501 | GAtest | 45015499 | 45015568 | ENSG00000126088 | + | 1 | 1 | 45015499 | 45015575 | RPS3_K562__SUB1_HepG2 | G | A | 1 | NA | NA | NA | G_A

intersection V1 V2 V3 V4 V5 V6 V7

chr1 | 45015500 | 45015501 | + | G_A | ENSG00000126088 | +

names_BED_std 'chrom''chromStart''chromEnd''name''score''strand'

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1925498011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDGCKDYWH4G67WJ7WWDYR3HYLAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGQ4TQMBRGE . You are receiving this because you commented.Message ID: @.***>

RomoL2 commented 7 months ago

If you would like I can send you an r script that will fix that problem for you but I am not sure why that extra column is being added. I can just add a line in to delete it.

Lindsay

On Sat, Feb 3, 2024 at 7:32 PM Lindsay Romo @.***> wrote:

So strange, bedtools seems to be adding an extra column. My intersection looks like this. What bedtools version are you using? Maybe they updated something that is adding an extra column. I am using

bedtools v2.30.0

intersection V1 V2 V3 V4 V5 V6 1: chr1 45015500 45015501 G_A ENSG00000126088 +

On Sat, Feb 3, 2024 at 7:19 PM pamelaxu213 @.***> wrote:

Here are the outputs:

vcf_UTR: chrom chromStart chromEnd info isoStart isoStop gene strand number_isos iso_loc UTRstart UTRstop eclip_tot ref alt tmp_key eqtl_info gwas_info miR_info RBPamp_info

chr1 | 45015500 | 45015501 | GAtest | 45015499 | 45015568 | ENSG00000126088 | + | 1 | 1 | 45015499 | 45015575 | RPS3_K562__SUB1_HepG2 | G | A | 1 | NA | NA | NA | G_A

intersection V1 V2 V3 V4 V5 V6 V7

chr1 | 45015500 | 45015501 | + | G_A | ENSG00000126088 | +

names_BED_std 'chrom''chromStart''chromEnd''name''score''strand'

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1925498011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDGCKDYWH4G67WJ7WWDYR3HYLAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGQ4TQMBRGE . You are receiving this because you commented.Message ID: @.***>

pamelaxu213 commented 7 months ago

You are right, it's the bedtools, I did not realize i was using a pretty old version: bedtools v2.27.1

The error is gone after I used a more recent version. Thank you so much!!

However, now I am getting

> CharacterizeVariants_single_input('/hpf/largeprojects/ccmbio/pxu/NonCoding/test/')
Enter 3'UTR single nucleotide variant chromosome number: 1
Enter variant position in 1-based coordinates: 45015501
Enter hg38 reference base (capitalized) at variant position: G
Enter variant base (capitalized): A
Enter a name for this variant (no spaces): test
[1] "incorporating eCLIP"
[1] "incorporating eQTLS"
[1] "incorporating GWAS"
[1] "incorporating microRNAs"
[1] "incorporating RBP motifs"
processing HNRNPDL.txt
Error in py_run_file_impl(file, local, convert) : 
    File "/hpf/largeprojects/ccmbio/pxu/conda_envs/RBPamp/lib/python3.7/site-packages/RBPamp-0.9.20-py3.7-linux-x86_64.egg/RBPamp/params.py", line 0
SyntaxError: unknown encoding: future_fstrings
Run `reticulate::py_last_error()` for details.
In addition: Warning message:
In rm(miR_predictions) : object 'miR_predictions' not found

It might be a python version problem. In my RBPamp conda, I have python3.7. Some posts online says python3.9 and below are not compatible with future-fstrings (https://discourse.acados.org/t/python-error-syntaxerror-encoding-problem-future-fstrings/820). Which python version do you have for RBPamp?

Thanks so much for taking the time here!

Pamela

RomoL2 commented 7 months ago

Yes this is because future f strings has to be installed (I think this was missing from the old requirements file). In command line run this and then try again:

pip install future-fstrings

On Sat, Feb 3, 2024 at 8:11 PM pamelaxu213 @.***> wrote:

You are right, it's the bedtools, I did not realize i was using a pretty old version: bedtools v2.27.1

The error is gone after I used a more recent version. Thank you so much!!

However, now I am getting

CharacterizeVariants_single_input('/hpf/largeprojects/ccmbio/pxu/NonCoding/test/')Enter 3'UTR single nucleotide variant chromosome number: 1Enter variant position in 1-based coordinates: 45015501Enter hg38 reference base (capitalized) at variant position: GEnter variant base (capitalized): AEnter a name for this variant (no spaces): test[1] "incorporating eCLIP"[1] "incorporating eQTLS"[1] "incorporating GWAS"[1] "incorporating microRNAs"[1] "incorporating RBP motifs"processing HNRNPDL.txtError in py_run_file_impl(file, local, convert) : File "/hpf/largeprojects/ccmbio/pxu/conda_envs/RBPamp/lib/python3.7/site-packages/RBPamp-0.9.20-py3.7-linux-x86_64.egg/RBPamp/params.py", line 0SyntaxError: unknown encoding: future_fstringsRun reticulate::py_last_error() for details.In addition: Warning message:In rm(miR_predictions) : object 'miR_predictions' not found

It might be a python version problem. In my RBPamp conda, I have python3.7. Some posts online says python3.9 and below are not compatible with future-fstrings ( https://discourse.acados.org/t/python-error-syntaxerror-encoding-problem-future-fstrings/820). Which python version do you have for RBPamp?

Thanks so much for taking the time here!

Pamela

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1925509082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDF4Y6NTLGIPNIKOZRTYR3N4BAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGUYDSMBYGI . You are receiving this because you commented.Message ID: @.***>

RomoL2 commented 7 months ago

Sorry, that may have to be in the conda RBPamp environment

conda activate RBPamp

pip install future-fstrings

On Sat, Feb 3, 2024 at 8:19 PM Lindsay Romo @.***> wrote:

Yes this is because future f strings has to be installed (I think this was missing from the old requirements file). In command line run this and then try again:

pip install future-fstrings

On Sat, Feb 3, 2024 at 8:11 PM pamelaxu213 @.***> wrote:

You are right, it's the bedtools, I did not realize i was using a pretty old version: bedtools v2.27.1

The error is gone after I used a more recent version. Thank you so much!!

However, now I am getting

CharacterizeVariants_single_input('/hpf/largeprojects/ccmbio/pxu/NonCoding/test/')Enter 3'UTR single nucleotide variant chromosome number: 1Enter variant position in 1-based coordinates: 45015501Enter hg38 reference base (capitalized) at variant position: GEnter variant base (capitalized): AEnter a name for this variant (no spaces): test[1] "incorporating eCLIP"[1] "incorporating eQTLS"[1] "incorporating GWAS"[1] "incorporating microRNAs"[1] "incorporating RBP motifs"processing HNRNPDL.txtError in py_run_file_impl(file, local, convert) : File "/hpf/largeprojects/ccmbio/pxu/conda_envs/RBPamp/lib/python3.7/site-packages/RBPamp-0.9.20-py3.7-linux-x86_64.egg/RBPamp/params.py", line 0SyntaxError: unknown encoding: future_fstringsRun reticulate::py_last_error() for details.In addition: Warning message:In rm(miR_predictions) : object 'miR_predictions' not found

It might be a python version problem. In my RBPamp conda, I have python3.7. Some posts online says python3.9 and below are not compatible with future-fstrings ( https://discourse.acados.org/t/python-error-syntaxerror-encoding-problem-future-fstrings/820). Which python version do you have for RBPamp?

Thanks so much for taking the time here!

Pamela

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1925509082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDF4Y6NTLGIPNIKOZRTYR3N4BAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGUYDSMBYGI . You are receiving this because you commented.Message ID: @.***>

pamelaxu213 commented 7 months ago

Hmm my future-fstrings is already present in RBPamp environment Requirement already satisfied: future-fstrings in ./RBPamp/lib/python3.7/site-packages/future_fstrings-1.2.0-py3.7.egg (1.2.0)

I'm not sure what python version was installed in your RBPamp, if you have python3.7 and up, it's probably better to remove line # -*- coding: future_fstrings -*- according to https://discourse.acados.org/t/python-error-syntaxerror-encoding-problem-future-fstrings/820

After I removed that line in the script, it is running to almost the end, Hooray! Waiting for it to finish completely. It's taking quite a while for one variant ..

Thanks so much for your help!

RomoL2 commented 7 months ago

Ah, ok, maybe that's the problem- thanks for the tip! If it gets through RBPamp it should work, that's always the sticking point (lots of code in there to go wrong, haha).

Lindsay

On Sat, Feb 3, 2024 at 8:45 PM pamelaxu213 @.***> wrote:

Hmm my future-fstrings is already present in RBPamp environment Requirement already satisfied: future-fstrings in ./RBPamp/lib/python3.7/site-packages/future_fstrings-1.2.0-py3.7.egg (1.2.0)

I'm not sure what python version was installed in your RBPamp, if you have python3.7 and up, it's probably better to remove line # -- coding: future_fstrings -- according to https://discourse.acados.org/t/python-error-syntaxerror-encoding-problem-future-fstrings/820

After I removed that line in the script, it is running to almost the end, Hooray!

Thanks so much for your help!

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1925534775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDDSOV72MCFMXMII5JLYR3R4NAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGUZTINZXGU . You are receiving this because you commented.Message ID: @.***>

IvantheDugtrio commented 7 months ago

Hi RomoL2, I'm also running through this and finding some of the same issues walking through a new installation. I also encountered the f_strings unknown encoding issue, and then this issue after removing that line from params.py: Error in py_run_file_impl(file, local, convert) : ImportError: /hom/vlab/miniconda3/envs/RBPamp/lib/python3.7/site-packages/RBPamp-0.9.20-py3.7-linux-x86_64.egg/RBPamp/cy/cy_model.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZGVdN4v_log

Thanks for your help!

RomoL2 commented 7 months ago

A few people are running into this issue so I am going to check with Marvin Jens (who wrote the rbpamp code) and get back to you shortly.

Lindsay

On Mon, Feb 5, 2024 at 5:58 PM Ivan De Dios @.***> wrote:

Hi RomoL2, I'm also running through this and finding some of the same issues walking through a new installation. I also encountered the f_strings unknown encoding issue, and then this issue after removing that line from params.py: Error in py_run_file_impl(file, local, convert) : ImportError: /hom/vlab/miniconda3/envs/RBPamp/lib/python3.10/site-packages/RBPamp-0.9.20-py3.7-linux-x86_64.egg/RBPamp/cy/ cy_model.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZGVdN4v_log

Thanks for your help!

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1928456136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDAVIP7WI6GERK7JC2LYSFPZLAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGQ2TMMJTGY . You are receiving this because you commented.Message ID: @.***>

RomoL2 commented 7 months ago

Ok, sorry this took some time but I think the issue is fixed. Try redownloading and let me know if you are still running into issues (it was a cython problem, but I think I have fixed it). You can also use this dockerfile which is working for me:

FROM continuumio/miniconda3:23.3.1-0 MAINTAINER Lindsay Romo @.***>

LABEL \ version="0.0.1" \ description="Image for RegVar(https://github.com/RomoL2/RegVar)"

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \ build-essential \ curl \ git \ gcc-4.8\ git-lfs \ libcurl4-openssl-dev \ libfontconfig1-dev \ libfreetype6-dev \ libfribidi-dev \ libharfbuzz-dev \ libjpeg-dev \ libpng-dev \ libssl-dev \ libtiff5-dev \ libxml2-dev \ python3 \ bedtools \ python3-pip \ r-base \ vim \ wget \ && apt-get clean \ && rm -rf /var/lib/apt/lists/*

RUN conda install -n base -c conda-forge mamba

RUN R -e "install.packages('devtools', dependencies=TRUE)" RUN R -e "devtools::install_github('RomoL2/RegVar')"

WORKDIR /usr/local/lib/R/site-library/RegVar RUN rm -r extdata \ && wget https://zenodo.org/record/10646785/files/extdata.tar.gz \ && tar -xf extdata.tar.gz \ && rm extdata.tar.gz \ && cd extdata \ && wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz \ && gunzip hg38.fa.gz

install RBPamp

WORKDIR /usr/local/lib/R/site-library/RegVar/extdata/RBPamp RUN mamba create --name RBPamp --file requirements.txt -c conda-forge --yes

RUN /opt/conda/envs/RBPamp/bin/pip install future-fstrings --force-reinstall \ && export CC=gcc \ && /opt/conda/envs/RBPamp/bin/python setup.py build \ && /opt/conda/envs/RBPamp/bin/python setup.py install

WORKDIR /

On Mon, Feb 5, 2024 at 5:58 PM Ivan De Dios @.***> wrote:

Hi RomoL2, I'm also running through this and finding some of the same issues walking through a new installation. I also encountered the f_strings unknown encoding issue, and then this issue after removing that line from params.py: Error in py_run_file_impl(file, local, convert) : ImportError: /hom/vlab/miniconda3/envs/RBPamp/lib/python3.10/site-packages/RBPamp-0.9.20-py3.7-linux-x86_64.egg/RBPamp/cy/ cy_model.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZGVdN4v_log

Thanks for your help!

— Reply to this email directly, view it on GitHub https://github.com/RomoL2/RegVar/issues/7#issuecomment-1928456136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOARDAVIP7WI6GERK7JC2LYSFPZLAVCNFSM6AAAAABCYIGSAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGQ2TMMJTGY . You are receiving this because you commented.Message ID: @.***>

IvantheDugtrio commented 7 months ago

@RomoL2 thanks for the update, it looks like this fixed the runtime issues.