Open jeet-vora opened 2 months ago
@seankim658
Loading, copying, exporting, deleting and metadata operations up to certain limits are free. BigQuery also has a free usage tier:
Estimate cost of running a query, calculate the byte processed by various queries, and get a monthly cost estimate based on your projected usage: https://cloud.google.com/bigquery/docs/best-practices-costs
Data from TCGA projects are organized into two tiers: Open Access and Controlled Access.
Open Access data tier contains data that cannot be attributed to an individual research participant. The Open Access data tier does not require user certification. Data in Open Access tier are available in the TCGA Data Portal.
Controlled Access data tier contains individual-level genotype data that are unique to an individual. Access to data in the Controlled Access data tier requires user certification through dbGaP Authorized Access mentioned above. Subject to 2023 Data Use Certification Agreement. Here's the summary:
Introduction and Statement of Policy
Terms of Access
2.1. Research Use
Cloud computing use requires specific permissions.
2.2. Requester and Approved User Responsibilities
Annual progress updates and project renewals are required.
2.3. Public Posting of Approved Users’ Research Use Statement
The PI agrees to publicly post information about themselves, their approved research use, and related details on the dbGaP website, including project specifics and citations of resulting publications.
2.4. Non-Identification:
Identifiable information can only be used with specific IRB approval.
2.5. Certificate of Confidentiality
This certificate protects sensitive information in NIH databases from being disclosed in legal proceedings or to unauthorized individuals. Disclosure is only permitted under specific conditions, such as with the individual’s consent or for medical treatment.
2.6. Non-Transferability
NIH controlled-access datasets and their derivatives must be retained by the approved users and cannot be distributed to unauthorized entities or individuals, ensuring data security and compliance with NIH policies.
2.7. Data Security and Unauthorized Data Release
Requester and Approved Users are responsible to manage and protect controlled-access datasets according to NIH security practices, and to promptly report any unauthorized data sharing or breaches.
2.8. Policy Compliance Violations
NIH may terminate data access if the requester violates the NIH GDS Policy, Data Use Certification Agreement, or Genomic Data User Code of Conduct, and requires prompt notification and remediation of any unauthorized data sharing or breaches.
2.9. Intellectual Property
The Requester and Approved Users acknowledge that anyone who has access follows the intellectual property principles.
2.10. Dissemination of Research Findings and Acknowledgement of Controlled-Access Datasets Subject to the NIH GDS Policy
Approved Users are encouraged to widely disseminate research findings from NIH-controlled datasets through publications and presentations, and must acknowledge the original data contributors and funding sources in all disclosures.
2.11. Research Use Reporting
The PI must provide annual progress updates, including data usage, publications, future research plans, and any policy violations, as part of the project renewal or close-out process.
2.12. Non-Endorsement, Indemnification
The NIH and data contributors do not guarantee the accuracy or reliability of the data and are not liable for any loss or damage resulting from its use.
2.13. Termination and Data Destruction
According to the NIH Bioinformatics Training and Education Program, TCGA is comprised of genomic, epigenomic, transcriptomic, and proteomic data combined with rich clinical information and related metadata from over 11,000 patients representing 33 cancer types.
For TCGA Google Big Data Query check