claraqin / neonMicrobe

Processing NEON soil microbe marker gene sequence data into ASV tables.
GNU Lesser General Public License v3.0
9 stars 4 forks source link

Update the vignettes to remove chimeras AFTER merging sequence tables #20

Closed claraqin closed 4 years ago

claraqin commented 4 years ago

The current ITS and 16S sequence processing vignettes remove chimeras BEFORE merging sequence tables together into one big sequence table. Ben Callahan recommends removing chimeras AFTER merging them. Zoey suggests that this could reduce Type 1 errors in chimera identification.

zoey-rw commented 4 years ago

Hey @claraqin , @ptpell , I just realized that I already wrote a function to remove chimeras after merging sequence tables: https://github.com/zoey-rw/neonSoilMicrobes/blob/master/mergeSequenceTables.r The inputs are the filepaths to all sequencing tables. I can clean this up a bit and add it to the utils.R file.

zoey-rw commented 4 years ago

Update: I think this might not be a good switch after all. Reasoning: 1) Whenever new samples are processed, would running the removeBimeras step have to include processing the entire NEON dataset? No, that'd be impractical. So it'd be more consistent if we did it by sequencing run. 2) The real reason, though, is timing. With ~330,000 ASVs, the step took at least 82+ hours for this user (albeit without multithreading). Our dataset has ~1,290,000 fungal ASVs from combined sequence tables. If computation time scales exponentially, then it quickly could become impractical/impossible?

claraqin commented 4 years ago

Hi Zoey, that's a great point... @ptpell do you have thoughts on the time and memory demands for removing chimeras after merging the sequence tables together?

mykophile commented 4 years ago

I’m also not sure I see the logic in searching for chimeras after merging sequencing tables. Presumably each sequencing table comes from one MiSeq run. Chimeras should form during PCR reactions, so they should be limited to parental ITS / 16S types that were together in the same PCR reactions and sequencing run. Looking for Chimeras outside of a single run just seems like you are going to get false positives.

On Oct 8, 2020, at 10:26 PM, Clara Qin notifications@github.com<mailto:notifications@github.com> wrote:

Hi Zoey, that's a great point... @ptpellhttps://github.com/ptpell do you have thoughts on the time and memory demands for removing chimeras after merging the sequence tables together?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/claraqin/NEON_soil_microbe_processing/issues/20#issuecomment-705975518, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC5DJWQ357XWEQ4PJYSSCELSJ2NJTANCNFSM4SFFCLEA.

Kabir Peay Associate Professor Dept. of Biology Stanford University (650) 723-0552

ptpell commented 4 years ago

HI all,

Removing chimeras after merging tables is the recommendation of Ben Callahan. I do not know the rationale for this and have not been able to find an explanation in a thread.

Zoey, you are right, the computational demands of removing chimeras after merging is essentially intractable on tables this big. You would need several high memory nodes on a cluster. I have been trying and failing to get it to work.

Sounds like we all agree to remove chimeras on a sequencing run basis.

On Fri, Oct 9, 2020 at 6:50 AM Kabir Peay notifications@github.com wrote:

I’m also not sure I see the logic in searching for chimeras after merging sequencing tables. Presumably each sequencing table comes from one MiSeq run. Chimeras should form during PCR reactions, so they should be limited to parental ITS / 16S types that were together in the same PCR reactions and sequencing run. Looking for Chimeras outside of a single run just seems like you are going to get false positives.

On Oct 8, 2020, at 10:26 PM, Clara Qin <notifications@github.com<mailto: notifications@github.com>> wrote:

Hi Zoey, that's a great point... @ptpellhttps://github.com/ptpell do you have thoughts on the time and memory demands for removing chimeras after merging the sequence tables together?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub< https://github.com/claraqin/NEON_soil_microbe_processing/issues/20#issuecomment-705975518>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AC5DJWQ357XWEQ4PJYSSCELSJ2NJTANCNFSM4SFFCLEA>.

Kabir Peay Associate Professor Dept. of Biology Stanford University (650) 723-0552

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/claraqin/NEON_soil_microbe_processing/issues/20#issuecomment-706192373, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBICEFFWGEURNC76XDVYK3SJ4IKPANCNFSM4SFFCLEA .