genome-nexus / annotation-tools

Tools developed for AACR GENIE to allow annotation of vcf and maf files from a number of centers and merging the results
MIT License
6 stars 15 forks source link

Center maf input isn't required to have all columns #5

Closed thomasyu888 closed 4 years ago

thomasyu888 commented 4 years ago

It is GENIE convention that we require at least Chromosome,Start_Position,Reference_Allele,Tumor_Seq_Allele2,Tumor_Sample_Barcode,to_alt_count Must have either t_ref_count or t_depth. These are the optional headers: t_ref_count, n_depth, n_ref_count, n_alt_count

This makes the standardize_mutation_data.py fail..

Loading data from input directory: ...

    Searching for files with extensions: vcf, maf, txt 

Loading data from file: .../data_mutations_extended_....txt
Traceback (most recent call last):
  File "genie-annotation-pkg/standardize_mutation_data.py", line 1429, in <module>
    main()
  File "genie-annotation-pkg/standardize_mutation_data.py", line 1426, in main
    generate_maf_from_input_data(input_directory, output_directory, extensions_list, center_name, sequence_source)
  File "genie-annotation-pkg/standardize_mutation_data.py", line 1394, in generate_maf_from_input_data
    maf_data = extract_maf_data_from_file(os.path.join(input_directory, filename), center_name, sequence_source)
  File "genie-annotation-pkg/standardize_mutation_data.py", line 1351, in extract_maf_data_from_file
    maf_record = create_maf_record_from_maf(data, center_name, sequence_source)
  File "genie-annotation-pkg/standardize_mutation_data.py", line 755, in create_maf_record_from_maf
    resolve_variant_allele_data(data, maf_data)
  File "genie-annotation-pkg/standardize_mutation_data.py", line 474, in resolve_variant_allele_data
    variant_type = resolve_variant_type(data, ref_allele, tumor_seq_allele)
  File "genie-annotation-pkg/standardize_mutation_data.py", line 276, in resolve_variant_type
    if variant_type == "1":
UnboundLocalError: local variable 'variant_type' referenced before assignment

[ERROR] standardizeMutationFilesFromDirectory(), error encountered while running genie-annotation-pkg/standardize_mutation_data.py

I did try to add some try catches to the code, but then it made Genome Nexus fail with stuff like

2020-04-27 02:49:44 [main] INFO  org.cbioportal.annotation.AnnotationPipeline - Starting AnnotationPipeline v1.0.0 on ip-10-5-19-203.ec2.internal with PID 25902 (/home/tyu/genome-nexus-annotation-pipeline/annotationPipeline/target/annotationPipeline-1.0.0.jar started by tyu in /home/tyu)
2020-04-27 02:49:45 [main] INFO  org.springframework.context.annotation.AnnotationConfigApplicationContext - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@1c5ecd10: startup date [Mon Apr 27 02:49:45 UTC 2020]; root of context hierarchy
2020-04-27 02:49:47 [main] INFO  org.springframework.integration.config.IntegrationRegistrar - No bean named 'integrationHeaderChannelRegistry' has been explicitly defined. Therefore, a default DefaultHeaderChannelRegistry will be created.
2020-04-27 02:49:47 [main] WARN  org.springframework.context.annotation.ConfigurationClassEnhancer - @Bean method ScopeConfiguration.stepScope is non-static and returns an object assignable to Spring's BeanFactoryPostProcessor interface. This will result in a failure to process annotations such as @Autowired, @Resource and @PostConstruct within the method's declaring @Configuration class. Add the 'static' modifier to this method to avoid these container lifecycle issues; see @Bean javadoc for complete details.
2020-04-27 02:49:47 [main] WARN  org.springframework.context.annotation.ConfigurationClassEnhancer - @Bean method ScopeConfiguration.jobScope is non-static and returns an object assignable to Spring's BeanFactoryPostProcessor interface. This will result in a failure to process annotations such as @Autowired, @Resource and @PostConstruct within the method's declaring @Configuration class. Add the 'static' modifier to this method to avoid these container lifecycle issues; see @Bean javadoc for complete details.
2020-04-27 02:49:47 [main] INFO  org.springframework.integration.config.DefaultConfiguringBeanFactoryPostProcessor - No bean named 'errorChannel' has been explicitly defined. Therefore, a default PublishSubscribeChannel will be created.
2020-04-27 02:49:47 [main] INFO  org.springframework.integration.config.DefaultConfiguringBeanFactoryPostProcessor - No bean named 'taskScheduler' has been explicitly defined. Therefore, a default ThreadPoolTaskScheduler will be created.
2020-04-27 02:49:47 [main] INFO  org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor - JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2020-04-27 02:49:47 [main] INFO  org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.1.3.Final
2020-04-27 02:49:48 [main] INFO  org.springframework.context.support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration$$EnhancerBySpringCGLIB$$b9bfddf0] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-04-27 02:49:48 [main] INFO  org.springframework.context.support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'integrationGlobalProperties' of type [org.springframework.beans.factory.config.PropertiesFactoryBean] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-04-27 02:49:48 [main] INFO  org.springframework.context.support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'integrationGlobalProperties' of type [java.util.Properties] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-04-27 02:49:48 [main] INFO  org.springframework.context.support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'messageBuilderFactory' of type [org.springframework.integration.support.DefaultMessageBuilderFactory] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-04-27 02:49:48 [main] INFO  org.springframework.context.support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean '(inner bean)#7f682304' of type [org.springframework.integration.channel.MessagePublishingErrorHandler] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-04-27 02:49:48 [main] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskScheduler - Initializing ExecutorService  'taskScheduler'
2020-04-27 02:49:48 [main] INFO  org.springframework.context.support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'taskScheduler' of type [org.springframework.scheduling.concurrent.ThreadPoolTaskScheduler] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-04-27 02:49:48 [main] INFO  org.springframework.context.support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'integrationHeaderChannelRegistry' of type [org.springframework.integration.channel.DefaultHeaderChannelRegistry] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-04-27 02:49:49 [main] WARN  org.springframework.batch.core.listener.AbstractListenerFactoryBean - org.springframework.batch.item.ItemStreamReader is an interface.  The implementing class will not be queried for annotation based listener configurations.  If using @StepScope on a @Bean method, be sure to return the implementing class so listner annotations can be used.
2020-04-27 02:49:49 [main] WARN  org.springframework.batch.core.listener.AbstractListenerFactoryBean - org.springframework.batch.item.ItemStreamWriter is an interface.  The implementing class will not be queried for annotation based listener configurations.  If using @StepScope on a @Bean method, be sure to return the implementing class so listner annotations can be used.
2020-04-27 02:49:50 [main] INFO  org.springframework.jdbc.datasource.init.ScriptUtils - Executing SQL script from class path resource [org/springframework/batch/core/schema-hsqldb.sql]
2020-04-27 02:49:50 [main] INFO  org.springframework.jdbc.datasource.init.ScriptUtils - Executed SQL script from class path resource [org/springframework/batch/core/schema-hsqldb.sql] in 9 ms.
2020-04-27 02:49:51 [main] INFO  org.springframework.ui.velocity.SpringResourceLoader - SpringResourceLoader for Velocity: using resource loader [org.springframework.context.annotation.AnnotationConfigApplicationContext@1c5ecd10: startup date [Mon Apr 27 02:49:45 UTC 2020]; root of context hierarchy] and resource loader paths [classpath:/templates/]
2020-04-27 02:49:52 [main] INFO  org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase -2147483648
2020-04-27 02:49:52 [main] INFO  org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase 0
2020-04-27 02:49:52 [main] INFO  org.springframework.integration.endpoint.EventDrivenConsumer - Adding {logging-channel-adapter:_org.springframework.integration.errorLogger} as a subscriber to the 'errorChannel' channel
2020-04-27 02:49:52 [main] INFO  org.springframework.integration.channel.PublishSubscribeChannel - Channel 'application.errorChannel' has 1 subscriber(s).
2020-04-27 02:49:52 [main] INFO  org.springframework.integration.endpoint.EventDrivenConsumer - started _org.springframework.integration.errorLogger
2020-04-27 02:49:52 [main] INFO  org.cbioportal.annotation.AnnotationPipeline - Started AnnotationPipeline in 7.513 seconds (JVM running for 8.639)
2020-04-27 02:49:52 [main] INFO  org.springframework.batch.core.repository.support.JobRepositoryFactoryBean - No database type set, using meta data indicating: HSQL
2020-04-27 02:49:52 [main] INFO  org.springframework.batch.core.launch.support.SimpleJobLauncher - No TaskExecutor has been set, defaulting to synchronous executor.
2020-04-27 02:49:52 [main] INFO  org.springframework.batch.core.launch.support.SimpleJobLauncher - Job: [SimpleJob: [name=annotationJob]] launched with the following parameters: [{filename=..._out/processed/data_mutations_extended_....txt.temp, outputFilename=..._out/annotated/data_mutations_extended_....txt.temp.annotated, replace=true, isoformOverride=uniprot, errorReportLocation=null, postIntervalSize=-1}]
2020-04-27 02:49:52 [main] INFO  org.springframework.batch.core.job.SimpleStepHandler - Executing step: [step]
2020-04-27 02:49:53 [main] INFO  org.cbioportal.annotation.pipeline.MutationRecordReader - Loading records from: ..._out/processed/data_mutations_extended_....txt.temp
2020-04-27 02:49:56 [main] INFO  org.cbioportal.annotation.pipeline.MutationRecordReader - Loaded 13084 records from: ..._out/processed/data_mutations_extended_....txt.temp
2020-04-27 02:49:56 [main] INFO  org.cbioportal.annotator.internal.GenomeNexusImpl - 13084 records to annotate
2020-04-27 02:49:56 [main] ERROR org.springframework.batch.core.step.AbstractStep - Encountered an error executing step step in job annotationJob
java.lang.NumberFormatException: For input string: ""
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:592)
    at java.lang.Integer.valueOf(Integer.java:766)
    at org.cbioportal.annotator.internal.GenomeNexusImpl.extractGenomicLocation(GenomeNexusImpl.java:685)
    at org.cbioportal.annotator.internal.GenomeNexusImpl.extractGenomicLocationAsString(GenomeNexusImpl.java:674)
    at org.cbioportal.annotator.internal.GenomeNexusImpl.annotateRecord(GenomeNexusImpl.java:130)
    at org.cbioportal.annotator.internal.GenomeNexusImpl.annotateRecordsUsingGET(GenomeNexusImpl.java:164)
    at org.cbioportal.annotation.pipeline.MutationRecordReader.open(MutationRecordReader.java:90)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:133)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:121)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
    at com.sun.proxy.$Proxy55.open(Unknown Source)
    at org.springframework.batch.item.support.CompositeItemStream.open(CompositeItemStream.java:96)
    at org.springframework.batch.core.step.tasklet.TaskletStep.open(TaskletStep.java:310)
    at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:197)
    at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
    at org.springframework.batch.core.job.AbstractJob.handleStep(AbstractJob.java:392)
    at org.springframework.batch.core.job.SimpleJob.doExecute(SimpleJob.java:135)
    at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:306)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135)
    at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:127)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
    at com.sun.proxy.$Proxy52.run(Unknown Source)
    at org.cbioportal.annotation.AnnotationPipeline.launchJob(AnnotationPipeline.java:88)
    at org.cbioportal.annotation.AnnotationPipeline.main(AnnotationPipeline.java:104)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:53)
    at java.lang.Thread.run(Thread.java:748)
2020-04-27 02:49:56 [main] INFO  org.springframework.batch.core.launch.support.SimpleJobLauncher - Job: [SimpleJob: [name=annotationJob]] completed with the following parameters: [{filename=..._out/processed/data_mutations_extended_....txt.temp, outputFilename=..._out/annotated/data_mutations_extended_....txt.temp.annotated, replace=true, isoformOverride=uniprot, errorReportLocation=null, postIntervalSize=-1}] and the following status: [FAILED]
2020-04-27 02:49:56 [Thread-2] INFO  org.springframework.context.annotation.AnnotationConfigApplicationContext - Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@1c5ecd10: startup date [Mon Apr 27 02:49:45 UTC 2020]; root of context hierarchy
2020-04-27 02:49:56 [Thread-2] INFO  org.springframework.context.support.DefaultLifecycleProcessor - Stopping beans in phase 0
2020-04-27 02:49:56 [Thread-2] INFO  org.springframework.integration.endpoint.EventDrivenConsumer - Removing {logging-channel-adapter:_org.springframework.integration.errorLogger} as a subscriber to the 'errorChannel' channel
2020-04-27 02:49:56 [Thread-2] INFO  org.springframework.integration.channel.PublishSubscribeChannel - Channel 'application.errorChannel' has 0 subscriber(s).
2020-04-27 02:49:56 [Thread-2] INFO  org.springframework.integration.endpoint.EventDrivenConsumer - stopped _org.springframework.integration.errorLogger
2020-04-27 02:49:56 [Thread-2] INFO  org.springframework.context.support.DefaultLifecycleProcessor - Stopping beans in phase -2147483648
2020-04-27 02:49:56 [Thread-2] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskScheduler - Shutting down ExecutorService 'taskScheduler'
ao508 commented 4 years ago

Extra headers shouldnt cause any issues in theory. The columns can remain but we will address the issues in the code which causes these exceptions shared above.