quantification of PAS usage by APAIQ

CTLife commented 9 months ago

How to quantify?

Quantification is very important for me; it is my purpose.

"python APAIQ/regression/evaluateRegression.v.2.py -h" showed:

usage: evaluateRegression.v.2.py [-h] --model MODEL --factor_path FACTOR_PATH [--input_file INPUT_FILE] [--input_plus INPUT_PLUS] [--input_minus INPUT_MINUS] [--pas_file PAS_FILE] [--out OUT] [--threshold THRESHOLD] [--depth DEPTH] [--window WINDOW] [--genome GENOME]

Evaluate each locus with RNAseq coverage exceed threshold and return prediction score.

optional arguments: -h, --help show this help message and exit --model MODEL the model weights file --factor_path FACTOR_PATH normalization file path --input_file INPUT_FILE unstranded bedGraph file --input_plus INPUT_PLUS plus strand bedGraph file --input_minus INPUT_MINUS minus strand bedGraph file --pas_file PAS_FILE pAs location file to be predicted expression level --out OUT output file path --threshold THRESHOLD peak length lower than threshold will be fiter out --depth DEPTH total number of mapped reads( in millions) --window WINDOW input length --genome GENOME assembly name of the genome. i.e. hg19, hg38, mm10

For FACTOR_PATH, should I chang normalize_factor for my data ? https://github.com/christear/APAIQ_release/tree/main/regression

PAS_FILE is the output of apaiq?

--genome is required ?

Could you please write wiki or detail steps for apaiq ? Thanks.

CTLife commented 9 months ago

I tried evaluateRegression.v.2.py and apaiq_reg, but there is error:

Evaluation based on regression model Loading RNAseq coverage Reading facotrs from APAIQ/regression/normalize_factor loading the PAS list from 1_APAIQ_All60/mRNA_SRR8462062_GSM3572799_NA18498_Input.predicted.txt Traceback (most recent call last): File "/home/yp/.conda/envs/apaiq_env/bin/apaiq_reg", line 33, in sys.exit(load_entry_point('apaiq==1.2.0', 'console_scripts', 'apaiq_reg')()) File "/home/yp/.conda/envs/apaiq_env/lib/python3.7/site-packages/apaiq-1.2.0-py3.7.egg/apaiq/evaluateRegression_v2.py", line 111, in main ValueError: could not convert string to float: 'chr1:+:1'

The 5th column of apaiq output is string, but not number. Why apaiq_reg convert the 5th column to float?

christear commented 9 months ago

Please used common bed-format files as input of regression model, in which the 5th column should be numbers. The APAIQ output is in bed graph format that is convenient to load to genome browser for visualization. I will add more details for regression.

CTLife commented 9 months ago

Please used common bed-format files as input of regression model, in which the 5th column should be numbers. The APAIQ output is in bed graph format that is convenient to load to genome browser for visualization. I will add more details for regression.

The 5th column of BED is the 4th column of apaiq output? For FACTOR_PATH, should I change normalize_factor for my data ?

christear commented 9 months ago

Please just use the provided files under regression directory. The output of the regression model is RPM (reads per million) and you need to calculate pas usage by yourself.

CTLife commented 9 months ago

Please just use the provided files under regression directory. The output of the regression model is RPM (reads per million) and you need to calculate pas usage by yourself.

How to calculate? Could you please give a link or a brief description?

christear commented 9 months ago

You need to annotate PAS firstly by overlapping the identified PAS with gene annotation. Next for each gene, the usage of each PAS was calculated by PAS_expression/Sum of PAS_expression from this gene. There is not available link for this.

CTLife commented 9 months ago

You need to annotate PAS firstly by overlapping the identified PAS with gene annotation. Next for each gene, the usage of each PAS was calculated by PAS_expression/Sum of PAS_expression from this gene. There is not available link for this.

OK. Thanks.

BTW: Although more than 10 tools are available for APA analysis, but all of them have some big drawbacks https://rnajournal.cshlp.org/content/29/12/1839.full

Hope your tool will be maintained and updated, and will work well.

xinkaitong commented 9 months ago

I met the same problem. The quantification of PAS usage can not be executed with the apaiq_reg command although I used the data in the APAIQ_release-main/demo file.

christear commented 9 months ago

The error should be caused by the head. I will modify the code to skip this.

christear commented 9 months ago

The problem should be solved with the latest version of the code.

xinkaitong commented 9 months ago

The problem should be solved with the latest version of the code.

ok, many thanks. I will try it again with the latest code.

xinkaitong commented 9 months ago

The problem should be solved with the latest version of the code.

ok, many thanks. I will try it again with the latest code.

Now, it did not occur error by using the data in the APAIQ_release-main/demo file. But when I applied it to my RNA-seq data of drosophila. It raised an error "Error message was: * Input error: Chromosome chr2L doesn't present in the .genome file. "

xinkaitong commented 9 months ago

The most important thing I want to know is whether this software could be used for drosophila melanogaster.

christear commented 8 months ago

It would be better to retrain a model specifically for fruit fly.

christear / APAIQ_release