Open CTLife opened 9 months ago
Evaluation based on regression model
Loading RNAseq coverage
Reading facotrs from APAIQ/regression/normalize_factor
loading the PAS list from 1_APAIQ_All60/mRNA_SRR8462062_GSM3572799_NA18498_Input.predicted.txt
Traceback (most recent call last):
File "/home/yp/.conda/envs/apaiq_env/bin/apaiq_reg", line 33, in
Please used common bed-format files as input of regression model, in which the 5th column should be numbers. The APAIQ output is in bed graph format that is convenient to load to genome browser for visualization. I will add more details for regression.
Please used common bed-format files as input of regression model, in which the 5th column should be numbers. The APAIQ output is in bed graph format that is convenient to load to genome browser for visualization. I will add more details for regression.
The 5th column of BED is the 4th column of apaiq output? For FACTOR_PATH, should I change normalize_factor for my data ?
Please just use the provided files under regression directory. The output of the regression model is RPM (reads per million) and you need to calculate pas usage by yourself.
Please just use the provided files under regression directory. The output of the regression model is RPM (reads per million) and you need to calculate pas usage by yourself.
How to calculate? Could you please give a link or a brief description?
You need to annotate PAS firstly by overlapping the identified PAS with gene annotation. Next for each gene, the usage of each PAS was calculated by PAS_expression/Sum of PAS_expression from this gene. There is not available link for this.
You need to annotate PAS firstly by overlapping the identified PAS with gene annotation. Next for each gene, the usage of each PAS was calculated by PAS_expression/Sum of PAS_expression from this gene. There is not available link for this.
OK. Thanks.
BTW: Although more than 10 tools are available for APA analysis, but all of them have some big drawbacks https://rnajournal.cshlp.org/content/29/12/1839.full
Hope your tool will be maintained and updated, and will work well.
I met the same problem. The quantification of PAS usage can not be executed with the apaiq_reg command although I used the data in the APAIQ_release-main/demo file.
The error should be caused by the head. I will modify the code to skip this.
The problem should be solved with the latest version of the code.
The problem should be solved with the latest version of the code.
ok, many thanks. I will try it again with the latest code.
The problem should be solved with the latest version of the code.
ok, many thanks. I will try it again with the latest code.
Now, it did not occur error by using the data in the APAIQ_release-main/demo file. But when I applied it to my RNA-seq data of drosophila. It raised an error "Error message was: * Input error: Chromosome chr2L doesn't present in the .genome file. "
The most important thing I want to know is whether this software could be used for drosophila melanogaster.
It would be better to retrain a model specifically for fruit fly.
How to quantify?
Quantification is very important for me; it is my purpose.
"python APAIQ/regression/evaluateRegression.v.2.py -h" showed:
usage: evaluateRegression.v.2.py [-h] --model MODEL --factor_path FACTOR_PATH [--input_file INPUT_FILE] [--input_plus INPUT_PLUS] [--input_minus INPUT_MINUS] [--pas_file PAS_FILE] [--out OUT] [--threshold THRESHOLD] [--depth DEPTH] [--window WINDOW] [--genome GENOME]
Evaluate each locus with RNAseq coverage exceed threshold and return prediction score.
optional arguments: -h, --help show this help message and exit --model MODEL the model weights file --factor_path FACTOR_PATH normalization file path --input_file INPUT_FILE unstranded bedGraph file --input_plus INPUT_PLUS plus strand bedGraph file --input_minus INPUT_MINUS minus strand bedGraph file --pas_file PAS_FILE pAs location file to be predicted expression level --out OUT output file path --threshold THRESHOLD peak length lower than threshold will be fiter out --depth DEPTH total number of mapped reads( in millions) --window WINDOW input length --genome GENOME assembly name of the genome. i.e. hg19, hg38, mm10
For FACTOR_PATH, should I chang normalize_factor for my data ? https://github.com/christear/APAIQ_release/tree/main/regression
PAS_FILE is the output of apaiq?
--genome is required ?
Could you please write wiki or detail steps for apaiq ? Thanks.