PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

Error parsing date #159

Closed anh151 closed 10 months ago

anh151 commented 10 months ago

Hello, PharmCAT version 2.8.2 Environment: Linux Mint java: jdk21 bcftools/bgzip/tabix: 1.18

We are trying to run PharmCAT on some local data. I tried running the pharmcat pipeline and each step individually and I get the same error. I tried v2.8.1 and same issue. I can try other versions if needed. This is a little bit time sensitive so if there is an older version that you don't think has this issue I can use that in the meantime.

cd /home/andrew/Desktop/bin/preprocessor && python3 -m pipenv run python /home/andrew/Desktop/bin/preprocessor/pharmcat_pipeline /home/andrew/Desktop/discovery/pharmcat_ready.vcf.gz --missing-to-ref -o /home/andrew/Desktop/discovery/pharmcat_10202023 -matcher
/home/andrew/.local/bin/jdk-21+35/bin/java -jar /home/andrew/Desktop/bin/preprocessor/pharmcat.jar -vcf pharmcat_10202023/ready.preprocessed.vcf.bgz -matcher
PharmCAT version: 2.8.2

Warning: Argument "-0"/"--missing-to-ref" supplied

THIS SHOULD ONLY BE USED IF: you sure your data is reference
at the missing positions instead of unreadable/uncallable at
those positions.

Running PharmCAT with positions as missing vs reference can
lead to different results.

Processing [/home/andrew/Desktop/discovery/pharmcat_ready.vcf.gz](https://file+.vscode-resource.vscode-cdn.net/home/andrew/Desktop/discovery/pharmcat_ready.vcf.gz) ...
[/home/andrew/Desktop/bin/preprocessor/preprocessor/utilities.py:703](https://file+.vscode-resource.vscode-cdn.net/home/andrew/Desktop/bin/preprocessor/preprocessor/utilities.py:703): FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  ref_pgx_regions = pd.concat([ref_pgx_regions, ref_pgx_regions.loc[idx_chr_m].assign(**{'CHROM': 'chrMT'})])
  * WARNING: "chr22:42127530 REF=G ALT=CAC" does not match PharmCAT expectation of ALT at "chr22:42127530 REF=G ALT=GCA"
  * WARNING: "chrX:154532990 REF=CGGT ALT=C" does not match PharmCAT expectation of REF at "chrX:154532990 REF=C ALT=T"
Adding back non-PGx variants at PGx positions...
* Cataloging 334 missing positions in [/home/andrew/Desktop/discovery/pharmcat_10202023/pharmcat_ready.missing_pgx_var.vcf](https://file+.vscode-resource.vscode-cdn.net/home/andrew/Desktop/discovery/pharmcat_10202023/pharmcat_ready.missing_pgx_var.vcf)

Running PharmCAT...
Checking files...
* Found 1 VCF file

Queueing up 1702 samples to process...
com.google.gson.JsonSyntaxException: Failed parsing 'Sep 27, 2023, 7:48:25 PM' as Date; at path $.modificationDate
    at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:90)
    at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:75)
    at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:46)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
    at com.google.gson.Gson.fromJson(Gson.java:1227)
    at com.google.gson.Gson.fromJson(Gson.java:1137)
    at com.google.gson.Gson.fromJson(Gson.java:1075)
    at org.pharmgkb.pharmcat.util.DataSerializer.deserializeDefinitionsFromJson(DataSerializer.java:63)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.readFile(DefinitionReader.java:194)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:55)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:45)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.defaultReader(DefinitionReader.java:223)
    at org.pharmgkb.pharmcat.Env.<init>(Env.java:43)
    at org.pharmgkb.pharmcat.BatchPharmCAT.execute(BatchPharmCAT.java:269)
    at org.pharmgkb.pharmcat.BatchPharmCAT.main(BatchPharmCAT.java:124)
Caused by: java.text.ParseException: Failed to parse date ["Sep 27, 2023, 7:48:25 PM"]: Invalid number: Sep 
    at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:279)
    at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:88)
    ... 16 more
Caused by: java.lang.NumberFormatException: Invalid number: Sep 
    at com.google.gson.internal.bind.util.ISO8601Utils.parseInt(ISO8601Utils.java:316)
    at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:133)
    ... 17 more

Thanks! Andrew

markwoon commented 10 months ago

We can't reproduce this problem on 2.8.2. Are you using the pharmcat jar as is?

This appears to be a problem parsing the allele translation data, which was fixed a long time ago...

whaleyr commented 10 months ago

We can't reproduce this problem on 2.8.2. Are you using the pharmcat jar as is?

This appears to be a problem parsing the allele translation data, which was fixed a long time ago...

Right, like he said. Are you using a version of the jar file you compiled yourself or are you using the jar downloaded from the release page? I tried an example VCF using the downloaded jar from the release page and I'm not seeing this problem.

anh151 commented 10 months ago

I appreciate the quick responses.

Well with the pharmcat_pipeline it downloads the pharmcat jar file during the first run along with the ref sequence. Here is me running the pipeline and allowing the pipeline to download the .jar file.

Could this be an environment or data specific issue? I can provide an example file if that would help. Or i can try in another environment.

cd /home/andrew/Desktop/bin/preprocessor && python3 -m pipenv run python /home/andrew/Desktop/bin/preprocessor/pharmcat_pipeline /home/andrew/Desktop/discovery/test.vcf.gz --missing-to-ref -o /home/andrew/Desktop/discovery/pharmcat_10202023 -matcher
PharmCAT version: 2.8.2

Warning: Argument "-0"/"--missing-to-ref" supplied

THIS SHOULD ONLY BE USED IF: you sure your data is reference
at the missing positions instead of unreadable/uncallable at
those positions.

Running PharmCAT with positions as missing vs reference can
lead to different results.


Downloading pharmcat.jar...
Processing /home/andrew/Desktop/discovery/test.vcf.gz ...
/home/andrew/Desktop/bin/preprocessor/preprocessor/utilities.py:703: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  ref_pgx_regions = pd.concat([ref_pgx_regions, ref_pgx_regions.loc[idx_chr_m].assign(**{'CHROM': 'chrMT'})])
  * WARNING: "chr22:42127530 REF=G ALT=CAC" does not match PharmCAT expectation of ALT at "chr22:42127530 REF=G ALT=GCA"
  * WARNING: "chrX:154532990 REF=CGGT ALT=C" does not match PharmCAT expectation of REF at "chrX:154532990 REF=C ALT=T"
Adding back non-PGx variants at PGx positions...
* Cataloging 334 missing positions in /home/andrew/Desktop/discovery/pharmcat_10202023/test.missing_pgx_var.vcf

Running PharmCAT...
Checking files...
* Found 1 VCF file
com.google.gson.JsonSyntaxException: Failed parsing 'Sep 27, 2023, 7:48:25 PM' as Date; at path $.modificationDate
    at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:90)
    at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:75)
    at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:46)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
    at com.google.gson.Gson.fromJson(Gson.java:1227)
    at com.google.gson.Gson.fromJson(Gson.java:1137)
    at com.google.gson.Gson.fromJson(Gson.java:1075)
    at org.pharmgkb.pharmcat.util.DataSerializer.deserializeDefinitionsFromJson(DataSerializer.java:63)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.readFile(DefinitionReader.java:194)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:55)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:45)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.defaultReader(DefinitionReader.java:223)
    at org.pharmgkb.pharmcat.Env.<init>(Env.java:43)
    at org.pharmgkb.pharmcat.BatchPharmCAT.execute(BatchPharmCAT.java:269)
    at org.pharmgkb.pharmcat.BatchPharmCAT.main(BatchPharmCAT.java:124)
Caused by: java.text.ParseException: Failed to parse date ["Sep 27, 2023, 7:48:25 PM"]: Invalid number: Sep 
    at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:279)
    at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:88)
    ... 16 more
Caused by: java.lang.NumberFormatException: Invalid number: Sep 
    at com.google.gson.internal.bind.util.ISO8601Utils.parseInt(ISO8601Utils.java:316)
    at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:133)
    ... 17 more



I also tried running in the All of Us environment and I get the same error.

export JAVA_HOME="/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/jdk-21+35" && \
export BCFTOOLS_PATH="/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/bcftools" && \
export BGZIP_PATH="/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/bgzip" && \
cd bin/preprocessor && \
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/python/bin/python3.9 -m pipenv run python \
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/preprocessor/pharmcat_pipeline \ 
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/test.vcf.gz --missing-to-ref -o \ 
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/pharmcat_test -matcher
PharmCAT version: 2.8.2

Warning: Argument "-0"/"--missing-to-ref" supplied

THIS SHOULD ONLY BE USED IF: you sure your data is reference
at the missing positions instead of unreadable/uncallable at
those positions.

Running PharmCAT with positions as missing vs reference can
lead to different results.

Downloading pharmcat.jar...
Only 1 CPU, cannot use concurrent mode
Processing /home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/test.vcf.gz ...
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/preprocessor/preprocessor/utilities.py:703: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  ref_pgx_regions = pd.concat([ref_pgx_regions, ref_pgx_regions.loc[idx_chr_m].assign(**{'CHROM': 'chrMT'})])
  * WARNING: "chr22:42127530 REF=G ALT=CAC" does not match PharmCAT expectation of ALT at "chr22:42127530 REF=G ALT=GCA"
  * WARNING: "chrX:154532990 REF=CGGT ALT=C" does not match PharmCAT expectation of REF at "chrX:154532990 REF=C ALT=T"
Adding back non-PGx variants at PGx positions...
* Cataloging 334 missing positions in /home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/pharmcat_test/test.missing_pgx_var.vcf

Running PharmCAT...
Checking files...
* Found 1 VCF file
com.google.gson.JsonSyntaxException: Failed parsing 'Sep 27, 2023, 7:48:25 PM' as Date; at path $.modificationDate
    at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:90)
    at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:75)
    at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:46)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
    at com.google.gson.Gson.fromJson(Gson.java:1227)
    at com.google.gson.Gson.fromJson(Gson.java:1137)
    at com.google.gson.Gson.fromJson(Gson.java:1075)
    at org.pharmgkb.pharmcat.util.DataSerializer.deserializeDefinitionsFromJson(DataSerializer.java:63)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.readFile(DefinitionReader.java:194)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:55)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:45)
    at org.pharmgkb.pharmcat.definition.DefinitionReader.defaultReader(DefinitionReader.java:223)
    at org.pharmgkb.pharmcat.Env.<init>(Env.java:43)
    at org.pharmgkb.pharmcat.BatchPharmCAT.execute(BatchPharmCAT.java:269)
    at org.pharmgkb.pharmcat.BatchPharmCAT.main(BatchPharmCAT.java:124)
Caused by: java.text.ParseException: Failed to parse date ["Sep 27, 2023, 7:48:25 PM"]: Invalid number: Sep 
    at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:279)
    at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:88)
    ... 16 more
Caused by: java.lang.NumberFormatException: Invalid number: Sep 
    at com.google.gson.internal.bind.util.ISO8601Utils.parseInt(ISO8601Utils.java:316)
    at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:133)
    ... 17 more
BinglanLi commented 10 months ago

Hi Andrew, thanks for the detailed error messages. I was able to replicate the issue using specifically JDK21. We are looking into the issue now.

whaleyr commented 10 months ago

If you need to run this now please try with JDK 17 instead and that should work.

anh151 commented 10 months ago

Thanks for the help! JDK17 was successful.

-Andrew

markwoon commented 10 months ago

PharmCAT 2.8.3 has been released and will work with Java 21.