I ran pathway annotation on a list of 55889 genes; biofilter ignored 22674 unrecognized identifiers and 114 ambiguous and returned outputs for 18482 unique genes. I then re-ran the list of the 37407 genes that were previously removed, and got additional pathway annotations for 18 genes. So the input did change between iterations but it was a subset of the original input.
The biofilter commands I ran for 2.4.2:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways--overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt --overwrite
Expected behavior
I would’ve expected those 18 genes to be annotated in the original run as they were included in my original gene set. Additionally, we would've expected similar behavior in regards to those 18 genes whether the --annotate or --filter flag was used.
These are the 18 genes in both lists that should have been annotated the first time around:
I did a quick retest on biofilter 2.4.3 and did not replicate the issue:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --annotate gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways_2.4.3 --overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt_2.4.3.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt_2.4.3 --overwrite
Biofilter 2.4.3 may have resolved the issue based on some recent reruns, but files can still be used for integrated testing
This issue was found on Biofilter 2.4.2
I ran pathway annotation on a list of 55889 genes; biofilter ignored 22674 unrecognized identifiers and 114 ambiguous and returned outputs for 18482 unique genes. I then re-ran the list of the 37407 genes that were previously removed, and got additional pathway annotations for 18 genes. So the input did change between iterations but it was a subset of the original input.
Input Files:
~group/personal/rasika/Biofilter_ROSMAP/RNAseq/ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt
~group/personal/rasika/Biofilter_ROSMAP/RNAseq/ROSMAP_RNAseq_removedbiofilt
The biofilter commands I ran for 2.4.2:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways--overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt --overwrite
Expected behavior I would’ve expected those 18 genes to be annotated in the original run as they were included in my original gene set. Additionally, we would've expected similar behavior in regards to those 18 genes whether the
--annotate
or--filter
flag was used.These are the 18 genes in both lists that should have been annotated the first time around:
I did a quick retest on biofilter 2.4.3 and did not replicate the issue:
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_FPKM_gene_ensembl_list_edit.txt --gene-identifier-type ensembl_gid --annotate gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_ENSEMBL_gene_pathways_2.4.3 --overwrite
biofilter.py --knowledge ~/group/datasets/loki/loki-20220926.db --gene-file ROSMAP_RNAseq_removedbiofilt_2.4.3.txt --gene-identifier-type ensembl_gid --filter gene group source --source kegg reactome go --verbose --report-configuration --prefix ROSMAP_RNAseq_removedbiofilt_2.4.3 --overwrite
Biofilter 2.4.3 may have resolved the issue based on some recent reruns, but files can still be used for integrated testing