KastanDay / ML4Bio

LLMs to execute Bioinformatics workflows, esp. RNA-seq
MIT License
0 stars 1 forks source link

Sweep: Create a full command line executable workflow for RNA-Seq on PBMC Samples. Open a new pull request (on a separate branch) and comment the PR number here when you're done #42

Open KastanDay opened 11 months ago

KastanDay commented 11 months ago

Experiment Type: RNA-Seq Sequencing of total cellular RNA

Workflow Management: Bash/SLURM Scripting and job scheduling

Software Stack: FastQC MultiQC STAR RSEM samtools DESeq2

What else to know about the pipeline? I am working PBMC samples collected from patients that are undergoing immunotherapy.

Use the data files existing in Report_WholeBrain as input for this workflow.

You should write a series of bash scripts and R scripts that can accomplish this task. Open a PR with those scripts when you're done.

Checklist - [X] Create `FastQC_PBMC.sh` ✓ https://github.com/KastanDay/ML4Bio/commit/55a5fe3a181818ffdddfb0b8068ba5fe5db92a63 [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/FastQC_PBMC.sh) - [X] Running GitHub Actions for `FastQC_PBMC.sh` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/FastQC_PBMC.sh) - [X] Create `MultiQC_PBMC.sh` ✓ https://github.com/KastanDay/ML4Bio/commit/d21b4586f2a72e5e4e25d9bbafd2d49fe1dc8342 [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/MultiQC_PBMC.sh) - [X] Running GitHub Actions for `MultiQC_PBMC.sh` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/MultiQC_PBMC.sh) - [X] Create `STAR_PBMC.sh` ✓ https://github.com/KastanDay/ML4Bio/commit/183a5b3e90911f8e8cd122f992e80d0a41eb5b00 [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/STAR_PBMC.sh) - [X] Running GitHub Actions for `STAR_PBMC.sh` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/STAR_PBMC.sh) - [X] Create `RSEM_PBMC.sh` ✓ https://github.com/KastanDay/ML4Bio/commit/bc3bdaf4c13ec9a1992f5912a2d6f8e27f85275b [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/RSEM_PBMC.sh) - [X] Running GitHub Actions for `RSEM_PBMC.sh` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/RSEM_PBMC.sh) - [X] Create `samtools_PBMC.sh` ✓ https://github.com/KastanDay/ML4Bio/commit/1636c79eda01d04bab8fc199ab9d20a1bdc6a1e8 [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/samtools_PBMC.sh) - [X] Running GitHub Actions for `samtools_PBMC.sh` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/samtools_PBMC.sh) - [X] Create `DESeq2_PBMC.R` ✓ https://github.com/KastanDay/ML4Bio/commit/065dbf54cefa56303d7ad6ed1b4d159486389bac [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/DESeq2_PBMC.R) - [X] Running GitHub Actions for `DESeq2_PBMC.R` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/DESeq2_PBMC.R) - [X] Modify `main.sh` ✓ https://github.com/KastanDay/ML4Bio/commit/887693d1d362d424abeebe7382c71a7653069224 [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/main.sh) - [X] Running GitHub Actions for `main.sh` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/main.sh) - [X] Modify `DESeq2.R` ✓ https://github.com/KastanDay/ML4Bio/commit/37fb8dbb83e944d6d87a751ffab2d580d10f2e87 [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/DESeq2.R) - [X] Running GitHub Actions for `DESeq2.R` ✓ [Edit](https://github.com/KastanDay/ML4Bio/edit/sweep/pbmc-rnaseq-workflow/DESeq2.R) ![Flowchart](https://raw.githubusercontent.com/KastanDay/ML4Bio/sweep/assets/4837c79f7d3c5ae2114695915e130083589df4b046fc7a2e8a212b8573c57d88_42_flowchart.svg)
sweep-ai[bot] commented 11 months ago

Here's the PR! https://github.com/KastanDay/ML4Bio/pull/43. See Sweep's process at dashboard.

Sweep Basic Tier: I'm using GPT-4. You have 5 GPT-4 tickets left for the month and 3 for the day. (tracking ID: 4be44b0fa6)

For more GPT-4 tickets, visit our payment portal. For a one week free trial, try Sweep Pro (unlimited GPT-4 tickets).

Actions (click)

Sandbox Execution ✓

Here are the sandbox execution logs prior to making any changes:

Sandbox logs for 29691d5
Checking main.sh for syntax errors... ✅ main.sh has no syntax errors! 1/1 ✓
Checking main.sh for syntax errors...
✅ main.sh has no syntax errors!

Sandbox passed on the latest main, so sandbox checks will be enabled for this issue.

Install Sweep Configs: Pull Request

Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/KastanDay/ML4Bio/blob/29691d52f0c1cab29bedb7d5358fefe22a1f62a8/main.sh#L1-L7 https://github.com/KastanDay/ML4Bio/blob/29691d52f0c1cab29bedb7d5358fefe22a1f62a8/Report_WholeBrain/SampleInfo_RUVvariables_WholeBrain_2022-05-12.csv#L1-L10
I also found the following external resources that might be helpful: **Summaries of links found in the content:** https://github.com/KastanDay/ML4Bio/tree/main/Report_WholeBrain: The page provides information about an RNA-Seq experiment and the software stack used for analysis. The workflow management is done using Bash/SLURM for scripting and job scheduling. The software stack includes FastQC, MultiQC, STAR, RSEM, samtools, and DESeq2. The user is working with PBMC samples from patients undergoing immunotherapy. The page also mentions a directory called "Report_WholeBrain" on GitHub, which contains the data files to be used as input for the workflow. The user is instructed to write a series of bash scripts and R scripts to accomplish the task and open a pull request with those scripts when done.

Step 2: ⌨️ Coding

Ran GitHub Actions for 55a5fe3a181818ffdddfb0b8068ba5fe5db92a63:

Ran GitHub Actions for d21b4586f2a72e5e4e25d9bbafd2d49fe1dc8342:

Ran GitHub Actions for 183a5b3e90911f8e8cd122f992e80d0a41eb5b00:

Ran GitHub Actions for bc3bdaf4c13ec9a1992f5912a2d6f8e27f85275b:

Ran GitHub Actions for 1636c79eda01d04bab8fc199ab9d20a1bdc6a1e8:

Ran GitHub Actions for 065dbf54cefa56303d7ad6ed1b4d159486389bac:


+++ @@ -1,9 +1,9 @@

!/bin/bash

Execute the entire workflow

-bash FastQC.sh -bash MultiQC.sh -bash STAR.sh -bash RSEM.sh -bash samtools.sh -Rscript DESeq2.R+bash FastQC_PBMC.sh +bash MultiQC_PBMC.sh +bash STAR_PBMC.sh +bash RSEM_PBMC.sh +bash samtools_PBMC.sh +Rscript DESeq2_PBMC.R

Ran GitHub Actions for 887693d1d362d424abeebe7382c71a7653069224:


+++ @@ -7,14 +7,7 @@

Specify conditions

condition <- factor(c('condition1', 'condition2'))

-# Create DESeqDataSet -dds <- DESeqDataSetFromMatrix(countData = countData, colData = data.frame(condition), design = ~ condition)

-# Run DESeq -dds <- DESeq(dds) +source('DESeq2_PBMC.R') +# Write results to file

-# Get results -res <- results(dds)

-# Write results to file -write.csv(as.data.frame(res), file='DESeq2_results.csv')

Ran GitHub Actions for 37fb8dbb83e944d6d87a751ffab2d580d10f2e87:


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/pbmc-rnaseq-workflow.


🎉 Latest improvements to Sweep:


💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request. Join Our Discord