DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

ES requests are not logged before they are made #4233

Closed achave11-ucsc closed 11 months ago

achave11-ucsc commented 2 years ago

…only when they are done. This means that we can't know for sure what request the lamba was making when it is waiting for Elasticsearch to come back.

Elasticsearch requests https://github.com/DataBiosphere/azul/issues/3312 are logged when AZUL_DEBUG is 2. When AZUL_DEBUG is 1 or lower, nothing is logged.

We want to log the first 1k of ES requests and responses when AZUL_DEBUG is 1 and the entire requests and responses when it is 2.

melainalegaspi commented 2 years ago

Assignee to spike in order to determine whether documents are logged twice during reindex when AZUL_DEBUG is 2 and post evidence.

achave11-ucsc commented 2 years ago

The documents are not logged twice. The acknowledgement of the indexed document from Elasticsearch is much smaller than the indexed documents.

@message
[INFO] 2022-06-24T16:40:18.941Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch PUT https://search-azul-index-sandbox-mcwjphhhdivigzrsrdmxm2uude.us-east-1.es.amazonaws.com:443/azul_v2_abrahamsc_dcp2_files/_doc/b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04%3A59%3A36.152000Z_exists?op_type=create&refresh=false [status:201 request:0.509s]
[DEBUG] 2022-06-24T16:40:18.941Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch > {"entity_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1","contents":{"sample_specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"samples":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","entity_type":"specimens","organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"model_organ":"~null","model_organ_part":"~null","effective_organ":"brain"}],"sequencing_inputs":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","sequencing_input_type":"cell_suspension"}],"specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"cell_suspensions":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","total_estimated_cells":1400,"total_estimated_cells_":1400,"selected_cell_type":["~null"],"organ":["brain"],"organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"]}],"cell_lines":[],"donors":[{"document_id":"d194868c-c19b-4dba-a4db-3debc5212302","biomaterial_id":"fetal_mouse_donor","biological_sex":"unknown","genus_species":["Mus musculus"],"development_stage":"Theiler stage 26","diseases":["normal"],"organism_age":"~null","organism_age_unit":"~null"}],"organoids":[],"files":[{"document_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1","content-type":"binary/octet-stream; dcp-type=data; dcp-type=data","indexed":0,"name":"SRR9008425_1.fastq.gz","crc32c":"fb4d6dcf","sha256":"f146aa11345ee47bb64a689b270e934d3382384e0f2236a7f26437074ca14d60","size":10190894575,"size_":10190894575,"uuid":"9cdd9639-3a5c-4889-b5f4-6893f114af46","drs_path":"v1_4c15703d-2502-4863-8d82-fc1b6ccf65f0_15480fbe-180b-4e42-97f8-2bca7c59e756","version":"2021-04-22T13:36:38.450000Z","file_type":"sequence_file","file_format":"fastq.gz","content_description":["DNA sequence"],"is_intermediate":9223372036854774784,"file_source":"~null","_type":"file","read_index":"read1","lane_index":1,"lane_index_":1,"related_files":[]}],"analysis_protocols":[],"imaging_protocols":[],"library_preparation_protocols":[{"document_id":"b4ec8457-7819-4efa-83e4-c855c8ade49c","library_construction_approach":"cDNA library construction","nucleic_acid_source":"single cell"}],"sequencing_protocols":[{"document_id":"48020340-98ab-4c1a-b79e-c461fe7191f9","instrument_manufacturer_model":"ONT PromethION","paired_end":0}],"sequencing_processes":[{"document_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218"}],"dates":[{"document_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1","submission_date":"2021-04-07T04:59:36.004000Z","update_date":"2021-04-22T13:36:42.706000Z","last_modified_date":"2021-04-22T13:36:42.706000Z","aggregate_last_modified_date":"9999-01-01T00:00:00.000000Z","aggregate_submission_date":"9999-01-01T00:00:00.000000Z","aggregate_update_date":"9999-01-01T00:00:00.000000Z"}],"projects":[{"document_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f","project_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","project_description":"Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.","project_short_name":"Nanopore_scSequencing","laboratory":[" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","CNRS, Institut de Pharmacologie Moléculaire et Cellulaire"],"institutions":["Université Côte d'Azur"],"contact_names":["Kevin,Lebrigand","Pascal,Barbry","Rainer,Waldmann","Virginie,Magnone"],"contributors":[{"contact_name":"Rainer,Waldmann","corresponding_contributor":1,"email":"rainer@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Pascal,Barbry","corresponding_contributor":1,"email":"barbry@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Virginie,Magnone","corresponding_contributor":9223372036854774784,"email":"~null","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Kevin,Lebrigand","corresponding_contributor":1,"email":"lebrigand@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"}],"publication_titles":["High throughput error corrected Nanopore single cell transcriptome sequencing."],"publications":[{"publication_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","publication_url":"https://doi.org/10.1038/s41467-020-17800-6","official_hca_publication":9223372036854774784,"doi":"10.1038/s41467-020-17800-6"}],"supplementary_links":["https://github.com/ucagenomix/sicelore"],"_type":"project","accessions":[{"namespace":"geo_series","accession":"GSE130708"},{"namespace":"insdc_project","accession":"SRP194984"},{"namespace":"insdc_study","accession":"PRJNA541014"}],"estimated_cell_count":9223372036854774784,"estimated_cell_count_":null}]},"document_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","source":{"id":"4c15703d-2502-4863-8d82-fc1b6ccf65f0","spec":"tdr:datarepo-dev-fef02a92:snapshot/hca_dev_0d4b87ea6e9e456982e41343e0e3259f__20210827_20210903:/0"},"bundle_uuid":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218","bundle_version":"2021-04-07T04:59:36.152000Z","bundle_deleted":false}
[DEBUG] 2022-06-24T16:40:18.941Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch < {"_index":"azul_v2_abrahamsc_dcp2_files","_type":"_doc","_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
[INFO] 2022-06-24T16:40:19.101Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch PUT https://search-azul-index-sandbox-mcwjphhhdivigzrsrdmxm2uude.us-east-1.es.amazonaws.com:443/azul_v2_abrahamsc_dcp2_cell_suspensions/_doc/94ca3100-fdf2-4514-b0af-817ccec8919f_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04%3A59%3A36.152000Z_exists?op_type=create&refresh=false [status:201 request:0.158s]
[DEBUG] 2022-06-24T16:40:19.101Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch > {"entity_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","contents":{"sample_specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"samples":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","entity_type":"specimens","organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"model_organ":"~null","model_organ_part":"~null","effective_organ":"brain"}],"sequencing_inputs":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","sequencing_input_type":"cell_suspension"}],"specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"cell_suspensions":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","total_estimated_cells":1400,"total_estimated_cells_":1400,"selected_cell_type":["~null"],"organ":["brain"],"organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"]}],"cell_lines":[],"donors":[{"document_id":"d194868c-c19b-4dba-a4db-3debc5212302","biomaterial_id":"fetal_mouse_donor","biological_sex":"unknown","genus_species":["Mus musculus"],"development_stage":"Theiler stage 26","diseases":["normal"],"organism_age":"~null","organism_age_unit":"~null"}],"organoids":[],"files":[{"document_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1","content-type":"binary/octet-stream; dcp-type=data; dcp-type=data","indexed":0,"name":"SRR9008425_1.fastq.gz","crc32c":"fb4d6dcf","sha256":"f146aa11345ee47bb64a689b270e934d3382384e0f2236a7f26437074ca14d60","size":10190894575,"size_":10190894575,"uuid":"9cdd9639-3a5c-4889-b5f4-6893f114af46","drs_path":"v1_4c15703d-2502-4863-8d82-fc1b6ccf65f0_15480fbe-180b-4e42-97f8-2bca7c59e756","version":"2021-04-22T13:36:38.450000Z","file_type":"sequence_file","file_format":"fastq.gz","content_description":["DNA sequence"],"is_intermediate":9223372036854774784,"file_source":"~null","_type":"file","read_index":"read1","lane_index":1,"lane_index_":1,"related_files":[]}],"analysis_protocols":[],"imaging_protocols":[],"library_preparation_protocols":[{"document_id":"b4ec8457-7819-4efa-83e4-c855c8ade49c","library_construction_approach":"cDNA library construction","nucleic_acid_source":"single cell"}],"sequencing_protocols":[{"document_id":"48020340-98ab-4c1a-b79e-c461fe7191f9","instrument_manufacturer_model":"ONT PromethION","paired_end":0}],"sequencing_processes":[{"document_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218"}],"dates":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","submission_date":"2021-04-07T04:59:35.393000Z","update_date":"2021-04-07T04:59:42.025000Z","last_modified_date":"2021-04-07T04:59:42.025000Z","aggregate_last_modified_date":"9999-01-01T00:00:00.000000Z","aggregate_submission_date":"9999-01-01T00:00:00.000000Z","aggregate_update_date":"9999-01-01T00:00:00.000000Z"}],"projects":[{"document_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f","project_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","project_description":"Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.","project_short_name":"Nanopore_scSequencing","laboratory":[" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","CNRS, Institut de Pharmacologie Moléculaire et Cellulaire"],"institutions":["Université Côte d'Azur"],"contact_names":["Kevin,Lebrigand","Pascal,Barbry","Rainer,Waldmann","Virginie,Magnone"],"contributors":[{"contact_name":"Rainer,Waldmann","corresponding_contributor":1,"email":"rainer@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Pascal,Barbry","corresponding_contributor":1,"email":"barbry@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Virginie,Magnone","corresponding_contributor":9223372036854774784,"email":"~null","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Kevin,Lebrigand","corresponding_contributor":1,"email":"lebrigand@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"}],"publication_titles":["High throughput error corrected Nanopore single cell transcriptome sequencing."],"publications":[{"publication_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","publication_url":"https://doi.org/10.1038/s41467-020-17800-6","official_hca_publication":9223372036854774784,"doi":"10.1038/s41467-020-17800-6"}],"supplementary_links":["https://github.com/ucagenomix/sicelore"],"_type":"project","accessions":[{"namespace":"geo_series","accession":"GSE130708"},{"namespace":"insdc_project","accession":"SRP194984"},{"namespace":"insdc_study","accession":"PRJNA541014"}],"estimated_cell_count":9223372036854774784,"estimated_cell_count_":null}]},"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","source":{"id":"4c15703d-2502-4863-8d82-fc1b6ccf65f0","spec":"tdr:datarepo-dev-fef02a92:snapshot/hca_dev_0d4b87ea6e9e456982e41343e0e3259f__20210827_20210903:/0"},"bundle_uuid":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218","bundle_version":"2021-04-07T04:59:36.152000Z","bundle_deleted":false}
[DEBUG] 2022-06-24T16:40:19.101Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch < {"_index":"azul_v2_abrahamsc_dcp2_cell_suspensions","_type":"_doc","_id":"94ca3100-fdf2-4514-b0af-817ccec8919f_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
[INFO] 2022-06-24T16:40:19.313Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch PUT https://search-azul-index-sandbox-mcwjphhhdivigzrsrdmxm2uude.us-east-1.es.amazonaws.com:443/azul_v2_abrahamsc_dcp2_samples/_doc/8a78fb4d-e0ab-4449-9324-50a5fee74327_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04%3A59%3A36.152000Z_exists?op_type=create&refresh=false [status:201 request:0.210s]
[DEBUG] 2022-06-24T16:40:19.313Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch > {"entity_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","contents":{"sample_specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"samples":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","entity_type":"specimens","organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"model_organ":"~null","model_organ_part":"~null","effective_organ":"brain"}],"sequencing_inputs":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","sequencing_input_type":"cell_suspension"}],"specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"cell_suspensions":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","total_estimated_cells":1400,"total_estimated_cells_":1400,"selected_cell_type":["~null"],"organ":["brain"],"organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"]}],"cell_lines":[],"donors":[{"document_id":"d194868c-c19b-4dba-a4db-3debc5212302","biomaterial_id":"fetal_mouse_donor","biological_sex":"unknown","genus_species":["Mus musculus"],"development_stage":"Theiler stage 26","diseases":["normal"],"organism_age":"~null","organism_age_unit":"~null"}],"organoids":[],"files":[{"document_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1","content-type":"binary/octet-stream; dcp-type=data; dcp-type=data","indexed":0,"name":"SRR9008425_1.fastq.gz","crc32c":"fb4d6dcf","sha256":"f146aa11345ee47bb64a689b270e934d3382384e0f2236a7f26437074ca14d60","size":10190894575,"size_":10190894575,"uuid":"9cdd9639-3a5c-4889-b5f4-6893f114af46","drs_path":"v1_4c15703d-2502-4863-8d82-fc1b6ccf65f0_15480fbe-180b-4e42-97f8-2bca7c59e756","version":"2021-04-22T13:36:38.450000Z","file_type":"sequence_file","file_format":"fastq.gz","content_description":["DNA sequence"],"is_intermediate":9223372036854774784,"file_source":"~null","_type":"file","read_index":"read1","lane_index":1,"lane_index_":1,"related_files":[]}],"analysis_protocols":[],"imaging_protocols":[],"library_preparation_protocols":[{"document_id":"b4ec8457-7819-4efa-83e4-c855c8ade49c","library_construction_approach":"cDNA library construction","nucleic_acid_source":"single cell"}],"sequencing_protocols":[{"document_id":"48020340-98ab-4c1a-b79e-c461fe7191f9","instrument_manufacturer_model":"ONT PromethION","paired_end":0}],"sequencing_processes":[{"document_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218"}],"dates":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","submission_date":"2021-04-07T04:59:35.383000Z","update_date":"2021-04-07T04:59:41.590000Z","last_modified_date":"2021-04-07T04:59:41.590000Z","aggregate_last_modified_date":"9999-01-01T00:00:00.000000Z","aggregate_submission_date":"9999-01-01T00:00:00.000000Z","aggregate_update_date":"9999-01-01T00:00:00.000000Z"}],"projects":[{"document_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f","project_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","project_description":"Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.","project_short_name":"Nanopore_scSequencing","laboratory":[" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","CNRS, Institut de Pharmacologie Moléculaire et Cellulaire"],"institutions":["Université Côte d'Azur"],"contact_names":["Kevin,Lebrigand","Pascal,Barbry","Rainer,Waldmann","Virginie,Magnone"],"contributors":[{"contact_name":"Rainer,Waldmann","corresponding_contributor":1,"email":"rainer@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Pascal,Barbry","corresponding_contributor":1,"email":"barbry@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Virginie,Magnone","corresponding_contributor":9223372036854774784,"email":"~null","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Kevin,Lebrigand","corresponding_contributor":1,"email":"lebrigand@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"}],"publication_titles":["High throughput error corrected Nanopore single cell transcriptome sequencing."],"publications":[{"publication_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","publication_url":"https://doi.org/10.1038/s41467-020-17800-6","official_hca_publication":9223372036854774784,"doi":"10.1038/s41467-020-17800-6"}],"supplementary_links":["https://github.com/ucagenomix/sicelore"],"_type":"project","accessions":[{"namespace":"geo_series","accession":"GSE130708"},{"namespace":"insdc_project","accession":"SRP194984"},{"namespace":"insdc_study","accession":"PRJNA541014"}],"estimated_cell_count":9223372036854774784,"estimated_cell_count_":null}]},"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","source":{"id":"4c15703d-2502-4863-8d82-fc1b6ccf65f0","spec":"tdr:datarepo-dev-fef02a92:snapshot/hca_dev_0d4b87ea6e9e456982e41343e0e3259f__20210827_20210903:/0"},"bundle_uuid":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218","bundle_version":"2021-04-07T04:59:36.152000Z","bundle_deleted":false}
[DEBUG] 2022-06-24T16:40:19.313Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch < {"_index":"azul_v2_abrahamsc_dcp2_samples","_type":"_doc","_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
[INFO] 2022-06-24T16:40:19.588Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch PUT https://search-azul-index-sandbox-mcwjphhhdivigzrsrdmxm2uude.us-east-1.es.amazonaws.com:443/azul_v2_abrahamsc_dcp2_projects/_doc/0d4b87ea-6e9e-4569-82e4-1343e0e3259f_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04%3A59%3A36.152000Z_exists?op_type=create&refresh=false [status:201 request:0.273s]
[DEBUG] 2022-06-24T16:40:19.588Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch > {"entity_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f","contents":{"sample_specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"samples":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","entity_type":"specimens","organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"model_organ":"~null","model_organ_part":"~null","effective_organ":"brain"}],"sequencing_inputs":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","sequencing_input_type":"cell_suspension"}],"specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"cell_suspensions":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","total_estimated_cells":1400,"total_estimated_cells_":1400,"selected_cell_type":["~null"],"organ":["brain"],"organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"]}],"cell_lines":[],"donors":[{"document_id":"d194868c-c19b-4dba-a4db-3debc5212302","biomaterial_id":"fetal_mouse_donor","biological_sex":"unknown","genus_species":["Mus musculus"],"development_stage":"Theiler stage 26","diseases":["normal"],"organism_age":"~null","organism_age_unit":"~null"}],"organoids":[],"files":[{"document_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1","content-type":"binary/octet-stream; dcp-type=data; dcp-type=data","indexed":0,"name":"SRR9008425_1.fastq.gz","crc32c":"fb4d6dcf","sha256":"f146aa11345ee47bb64a689b270e934d3382384e0f2236a7f26437074ca14d60","size":10190894575,"size_":10190894575,"uuid":"9cdd9639-3a5c-4889-b5f4-6893f114af46","drs_path":"v1_4c15703d-2502-4863-8d82-fc1b6ccf65f0_15480fbe-180b-4e42-97f8-2bca7c59e756","version":"2021-04-22T13:36:38.450000Z","file_type":"sequence_file","file_format":"fastq.gz","content_description":["DNA sequence"],"is_intermediate":9223372036854774784,"file_source":"~null","_type":"file","read_index":"read1","lane_index":1,"lane_index_":1,"related_files":[]}],"analysis_protocols":[],"imaging_protocols":[],"library_preparation_protocols":[{"document_id":"b4ec8457-7819-4efa-83e4-c855c8ade49c","library_construction_approach":"cDNA library construction","nucleic_acid_source":"single cell"}],"sequencing_protocols":[{"document_id":"48020340-98ab-4c1a-b79e-c461fe7191f9","instrument_manufacturer_model":"ONT PromethION","paired_end":0}],"sequencing_processes":[{"document_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218"}],"matrices":[],"contributed_analyses":[],"dates":[{"document_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f","submission_date":"2021-01-14T15:34:51.095000Z","update_date":"2021-04-07T04:59:36.897000Z","last_modified_date":"2021-04-07T04:59:36.897000Z","aggregate_last_modified_date":"2021-04-26T02:28:36.139000Z","aggregate_submission_date":"2021-01-14T15:34:51.095000Z","aggregate_update_date":"2021-04-26T02:28:36.139000Z"}],"projects":[{"document_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f","project_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","project_description":"Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.","project_short_name":"Nanopore_scSequencing","laboratory":[" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","CNRS, Institut de Pharmacologie Moléculaire et Cellulaire"],"institutions":["Université Côte d'Azur"],"contact_names":["Kevin,Lebrigand","Pascal,Barbry","Rainer,Waldmann","Virginie,Magnone"],"contributors":[{"contact_name":"Rainer,Waldmann","corresponding_contributor":1,"email":"rainer@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Pascal,Barbry","corresponding_contributor":1,"email":"barbry@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Virginie,Magnone","corresponding_contributor":9223372036854774784,"email":"~null","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Kevin,Lebrigand","corresponding_contributor":1,"email":"lebrigand@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"}],"publication_titles":["High throughput error corrected Nanopore single cell transcriptome sequencing."],"publications":[{"publication_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","publication_url":"https://doi.org/10.1038/s41467-020-17800-6","official_hca_publication":9223372036854774784,"doi":"10.1038/s41467-020-17800-6"}],"supplementary_links":["https://github.com/ucagenomix/sicelore"],"_type":"project","accessions":[{"namespace":"geo_series","accession":"GSE130708"},{"namespace":"insdc_project","accession":"SRP194984"},{"namespace":"insdc_study","accession":"PRJNA541014"}],"estimated_cell_count":9223372036854774784,"estimated_cell_count_":null}]},"document_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","source":{"id":"4c15703d-2502-4863-8d82-fc1b6ccf65f0","spec":"tdr:datarepo-dev-fef02a92:snapshot/hca_dev_0d4b87ea6e9e456982e41343e0e3259f__20210827_20210903:/0"},"bundle_uuid":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218","bundle_version":"2021-04-07T04:59:36.152000Z","bundle_deleted":false}
[DEBUG] 2022-06-24T16:40:19.588Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch < {"_index":"azul_v2_abrahamsc_dcp2_projects","_type":"_doc","_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
[INFO] 2022-06-24T16:40:19.792Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch PUT https://search-azul-index-sandbox-mcwjphhhdivigzrsrdmxm2uude.us-east-1.es.amazonaws.com:443/azul_v2_abrahamsc_dcp2_bundles/_doc/8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04%3A59%3A36.152000Z_exists?op_type=create&refresh=false [status:201 request:0.202s]
[DEBUG] 2022-06-24T16:40:19.792Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch > {"entity_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218","contents":{"sample_specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"samples":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","entity_type":"specimens","organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"model_organ":"~null","model_organ_part":"~null","effective_organ":"brain"}],"sequencing_inputs":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","sequencing_input_type":"cell_suspension"}],"specimens":[{"document_id":"8a78fb4d-e0ab-4449-9324-50a5fee74327","biomaterial_id":"mouse_brain_specimen","has_input_biomaterial":"~null","_source":"specimen_from_organism","disease":["normal"],"organ":"brain","organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"],"storage_method":"~null","preservation_method":"~null","_type":"specimen"}],"cell_suspensions":[{"document_id":"94ca3100-fdf2-4514-b0af-817ccec8919f","biomaterial_id":"mouse_brain_cells","total_estimated_cells":1400,"total_estimated_cells_":1400,"selected_cell_type":["~null"],"organ":["brain"],"organ_part":["cerebral cortex","hippocampal formation","neural tube ventricular layer"]}],"cell_lines":[],"donors":[{"document_id":"d194868c-c19b-4dba-a4db-3debc5212302","biomaterial_id":"fetal_mouse_donor","biological_sex":"unknown","genus_species":["Mus musculus"],"development_stage":"Theiler stage 26","diseases":["normal"],"organism_age":"~null","organism_age_unit":"~null"}],"organoids":[],"files":[{"document_id":"b3065dc0-bb6e-4d03-b24d-d97ac7bd8bd1","content-type":"binary/octet-stream; dcp-type=data; dcp-type=data","indexed":0,"name":"SRR9008425_1.fastq.gz","crc32c":"fb4d6dcf","sha256":"f146aa11345ee47bb64a689b270e934d3382384e0f2236a7f26437074ca14d60","size":10190894575,"size_":10190894575,"uuid":"9cdd9639-3a5c-4889-b5f4-6893f114af46","drs_path":"v1_4c15703d-2502-4863-8d82-fc1b6ccf65f0_15480fbe-180b-4e42-97f8-2bca7c59e756","version":"2021-04-22T13:36:38.450000Z","file_type":"sequence_file","file_format":"fastq.gz","content_description":["DNA sequence"],"is_intermediate":9223372036854774784,"file_source":"~null","_type":"file","read_index":"read1","lane_index":1,"lane_index_":1,"related_files":[]}],"analysis_protocols":[],"imaging_protocols":[],"library_preparation_protocols":[{"document_id":"b4ec8457-7819-4efa-83e4-c855c8ade49c","library_construction_approach":"cDNA library construction","nucleic_acid_source":"single cell"}],"sequencing_protocols":[{"document_id":"48020340-98ab-4c1a-b79e-c461fe7191f9","instrument_manufacturer_model":"ONT PromethION","paired_end":0}],"sequencing_processes":[{"document_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218"}],"matrices":[],"contributed_analyses":[],"dates":[{"document_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218","submission_date":"2021-04-07T04:59:36.152000Z","update_date":"2021-04-07T04:59:36.152000Z","last_modified_date":"2021-04-07T04:59:36.152000Z","aggregate_last_modified_date":"2021-04-26T02:28:36.139000Z","aggregate_submission_date":"2021-01-14T15:34:51.095000Z","aggregate_update_date":"2021-04-26T02:28:36.139000Z"}],"projects":[{"document_id":"0d4b87ea-6e9e-4569-82e4-1343e0e3259f","project_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","project_description":"Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.","project_short_name":"Nanopore_scSequencing","laboratory":[" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","CNRS, Institut de Pharmacologie Moléculaire et Cellulaire"],"institutions":["Université Côte d'Azur"],"contact_names":["Kevin,Lebrigand","Pascal,Barbry","Rainer,Waldmann","Virginie,Magnone"],"contributors":[{"contact_name":"Rainer,Waldmann","corresponding_contributor":1,"email":"rainer@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Pascal,Barbry","corresponding_contributor":1,"email":"barbry@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":" CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Virginie,Magnone","corresponding_contributor":9223372036854774784,"email":"~null","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"},{"contact_name":"Kevin,Lebrigand","corresponding_contributor":1,"email":"lebrigand@ipmc.cnrs.fr","institution":"Université Côte d'Azur","laboratory":"CNRS, Institut de Pharmacologie Moléculaire et Cellulaire","project_role":"~null"}],"publication_titles":["High throughput error corrected Nanopore single cell transcriptome sequencing."],"publications":[{"publication_title":"High throughput error corrected Nanopore single cell transcriptome sequencing.","publication_url":"https://doi.org/10.1038/s41467-020-17800-6","official_hca_publication":9223372036854774784,"doi":"10.1038/s41467-020-17800-6"}],"supplementary_links":["https://github.com/ucagenomix/sicelore"],"_type":"project","accessions":[{"namespace":"geo_series","accession":"GSE130708"},{"namespace":"insdc_project","accession":"SRP194984"},{"namespace":"insdc_study","accession":"PRJNA541014"}],"estimated_cell_count":9223372036854774784,"estimated_cell_count_":null}]},"document_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","source":{"id":"4c15703d-2502-4863-8d82-fc1b6ccf65f0","spec":"tdr:datarepo-dev-fef02a92:snapshot/hca_dev_0d4b87ea6e9e456982e41343e0e3259f__20210827_20210903:/0"},"bundle_uuid":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218","bundle_version":"2021-04-07T04:59:36.152000Z","bundle_deleted":false}
[DEBUG] 2022-06-24T16:40:19.792Z 55c5987b-c4be-5479-8071-c3542fa0eb59 elasticsearch < {"_index":"azul_v2_abrahamsc_dcp2_bundles","_type":"_doc","_id":"8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_8f01b1d4-fc15-4e08-b1b2-c576f0ae7218_2021-04-07T04:59:36.152000Z_exists","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
dsotirho-ucsc commented 2 years ago

@hannes-ucsc: "It will be difficult to implement the log message truncation because the log message is emitted by Elasticsearch library code, not our own."

dsotirho-ucsc commented 11 months ago

@hannes-ucsc: "Spike to come up with design. Consider monkey-patching the ES client, forking it or adding our own log statement before every Elasticsearch request."

hannes-ucsc commented 11 months ago

For demo, show logs in CloudWatch. Mention that AZUL_DEBUG=0 completely disables them, and why (test log volume).