Open robinsleith opened 1 year ago
Hmmm. Empty tibbles by themselves are not a problem (and can be quite handy.) Do you have 5 minutes to point your screen at me? (I'm headed out for a week of vacation in about 2h).
Lets table until you're back, I will have to rendezvous with Rene to get an example dataset together. Have a nice vacation!!!!
Should I be putting my dada2 boots on?
Ha! thanks for this reminder email. yeah... we should all probably put our heads together for this one. We just got results back from Andre where he was saying he wasnt concerned about our data even though it looked questionable to us. I gotta do a little more digging to compare our processing pipelines.
On Fri, Jan 20, 2023 at 9:10 AM Ben Tupper @.***> wrote:
Should I be putting my dada2 boots on?
— Reply to this email directly, view it on GitHub https://github.com/BigelowLab/edna-dada2/issues/28#issuecomment-1398447264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIQW4XXBZAVBBTBWRIB2QNLWTKMEBANCNFSM6AAAAAATYFMGGE . You are receiving this because you were mentioned.Message ID: @.***>
--
Rene Francolini she/her
PhD Candidate
Graduate Research Assistant
Bigelow Laboratory for Ocean Sciences
School of Maine Sciences, University of Maine
–––––––––––––––––––––––––––––––––––––––––––––––––––
E @.***
M 973-303-5203
W Personal Website http://rfrancolini.github.io | Brady Lab Website https://umaine.edu/bradylab/ |
–––––––––––––––––––––––––––––––––––––––––––––––––––
Bold Science for Our Blue Planet | BIGELOW.ORG https://www.bigelow.org
60 BIGELOW DRIVE | EAST BOOTHBAY, MAINE 04544 USA
When works for you two?
We can do anytime this afternoon (before 4).
Bah - a lost day. Sorry about that. What are the prospects for today?
I'm around 11-4 today
On Thu, Jan 26, 2023, 10:05 Ben Tupper @.***> wrote:
Bah - a lost day. Sorry about that. What are the prospects for today?
— Reply to this email directly, view it on GitHub https://github.com/BigelowLab/edna-dada2/issues/28#issuecomment-1405146303, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIQW4XTVLO47D6JNQUNNF4TWUKHDLANCNFSM6AAAAAATYFMGGE . You are receiving this because you were mentioned.Message ID: @.***>
1-4 ideal for me!
This may be unrelated but since we were recently messing with the track.csv for this issue I am posting here. I tried to run the pacbio pipeline which does not have reverse reads and got the following error for the track function.
Error:
! Tibble columns must have compatible sizes.
• Size 390: Existing data.
• Size 0: Column `denoised_reverse`.
ℹ Only values of size one are recycled.
Backtrace:
▆
1. └─global main(CFG)
2. ├─readr::write_csv(make_track(), file.path(CFG$output_path, "track.csv"))
3. │ └─readr::write_delim(...)
4. │ ├─base::stopifnot(is.data.frame(x))
5. │ └─base::is.data.frame(x)
6. └─make_track()
7. └─dplyr::tibble(...)
8. └─tibble:::tibble_quos(xs, .rows, .name_repair)
9. └─tibble:::vectbl_recycle_rows(res, first_size, j, given_col_names[[j]])
10. └─rlang::cnd_signal(error_incompatible_size(n, name, size, "Existing data"))
Execution halted
Ok, now I am getting this as output. I understand why NA shows up for things like merged or others that have reverse reads but whats up with nonchim and final_prr. dataset is here /mnt/storage/data/edna/dada/projects/shane/april_pacbio/process
name,input,filtered,denoised_forward,denoised_reverse,nonchim,final_prr
000,63282,59626,NA,NA,NA,NA
000,92287,87552,NA,NA,NA,NA
000,54462,51809,NA,NA,NA,NA
002,53805,51177,NA,NA,NA,NA
002,37988,36206,NA,NA,NA,NA
002,37779,36012,NA,NA,NA,NA
003,22202,21168,NA,NA,NA,NA
003,16733,15900,NA,NA,NA,NA
003,14836,14165,NA,NA,NA,NA
004,30789,29405,NA,NA,NA,NA
004,43792,41718,NA,NA,NA,NA
004,46460,44160,NA,NA,NA,NA
I feel like we solved this but I think we (correctly for triage) focused on making sure these samples didn't crash the pipeline. Did we ever get those empty samples captured in track.csv? I think folks would like to have an accounting of what happened to every file that they feed into the pipeline, which currently does not happen for files that have no reads after cutadapt...
As more folks get sequence data back we have run into the issue of "blank" samples messing up the pipeline. These are samples that are run through the wet lab process with no DNA to test for contamination at various steps of the sequencing process. We often find that these files have very few reads and many have no reads after filterandtrim. I think our method of collecting stats in a tibble causes a crash when there are no files or nothing in a set of files. Do you have an idea how to make the pipeline robust to these samples? Ideally a sample with no reads after filterandtrim would just get recorded in the final track.csv as having 0 reads. I dont think we can pass empty files to downstream functions so we will have to figure out how to exclude those files without crashing the pipeline. Happy to chat more! @rfrancolini do you mind putting together a few samples we can test this on? You had examples of blanks or samples that failed crashing the pipeline right?