bensutherland / eDNA_metabarcoding

Pipeline to analyze eDNA metabarcoding samples (PE and SE, demultiplexed, multiplexed)
12 stars 2 forks source link

00b_split_by_type.sh - grepping issue #8

Closed midwesternmouse closed 2 years ago

midwesternmouse commented 2 years ago

Hello! I am so happy to have found this page - I have been trying to separate demultiplexed sequences using ngsfilter for so long, and this code has been so useful!

I have unfortunately hit a problem when trying to grep sequences using the tags attached via ngsfilter. I finally got my code to work (and grep!), but it grepped 0 entries 😢

Here is my code so far -- I've modified it to work within my environment.

activate virtual env

source obi3-env/bin/activate cd {folder where all the files are}

import a few sequences to test code functionality

obi import raw_sequences/E-AFR090512_S75_L001_R1_001.fastq.gz test_072022/E-AFR090512_S75_L001_R1 obi import raw_sequences/E-AFR090512_S75_L001_R2_001.fastq.gz test_072022/E-AFR090512_S75_L001_R2 obi import raw_sequences/E-ALM180712_S34_L001_R1_001.fastq.gz test_072022/E-ALM180712_S34_L001_R1 obi import raw_sequences/E-ALM180712_S34_L001_R2_001.fastq.gz test_072022/E-ALM180712_S34_L001_R2 obi import raw_sequences/E-ANE040812_S36_L001_R1_001.fastq.gz test_072022/E-ANE040812_S36_L001_R1 obi import raw_sequences/E-ANE040812_S36_L001_R2_001.fastq.gz test_072022/E-ANE040812_S36_L001_R2

for some reason, obi import does NOT want to work within a for-loop, so I just do it manually

import the ngsfilter file w/ info on sequences

obi import --ngsfilter baboon_diet_ngsfilter.txt test_072022/ngsfile

check if import worked

obi ls test_072022

create file with all the sample names to use for for-loops throughout pipeline

ls _R1_001.fastq.gz | cut -c -23 > ../samples_R1 ls _R2_001.fastq.gz | cut -c -23 > ../samples_R2

add primer tags using ngsfilter

for sample in $(cat samples_R1) do echo "On sample: $sample" obi ngsfilter -t ngsfile -u test072022/unidentified${sample} test_072022/${sample} test072022/identified${sample} done

separate samples using ngsfilter and grep first test it using one sample before putting it in a for-loop

obi grep -E -A3 'sample\=trnl' test_072022/identified_E-AFR090512_S75_L001_R1 | obi grep -vE '^--$' - > trnl_E-AFR090512_S75_L001_R1

error codes: "error: unrecognized arguments: -s", "error: unrecognized arguments: -E", "error: argument -v/--invert-selection: ignored explicit argument 'E'"

obi grep -S 'trnl' test_072022/identified_E-AFR090512_S75_L001_R1 | obi grep '^--$' - > trnl_E-AFR090512_S75_L001_R1

error codes: "error: the following arguments are required: OUTPUT", "ValueError: unknown url type: '^--$'", "FileNotFoundError: [Errno 2] No such file or directory: '^--$'"

obi grep -S 'trnl' test_072022/identified_E-AFR090512_S75_L001_R1 trnl_E-AFR090512_S75_L001_R1

results: "2022-06-30 19:36:26,770 [grep : INFO ] Grepped 0 entries"

So I've struggled to figure out how to grep sequences individually within my file and filter them into a new file. Is the original script formatted for obitools, and not obitools3? The grep is different (i.e., including 'obi').

Any help would be massively appreciated, thank you!!

bensutherland commented 2 years ago

Hello @midwesternmouse , Are you able to please provide an example file that causes the error (is it test_072022/identified_E-AFR090512_S75_L001_R1 ?), the version of any relevant software being applied (obitools, other?) and the operating system being used? Currently this pipeline hasn't been maintained/ updated for a little while, and so it is possible that your question may be best directed towards the obitools developers, but I will help if I can!

Ben

midwesternmouse commented 2 years ago

Hi @bensutherland !

I'm currently using The OBITools 3 - Version 3.0.1b17 on MacOS 10.15.7.

I think the file that's used in the command line causing the error (test_072022/identified_E-Agr090512_S75_L001_R1) is a view within the obitools3 environment. I'm not fully sure what file type it is. Although checking now, I see it, too, has 0 entries -- could explain why 0 entries were grepped: identified_E-BOY210712_S47_L001_R1: Date created: Thu Jun 30 18:42:36 2022 ; Line count: 0.

This brings up a whole new issue of after running the code for sample in $(cat samples_R1) do echo "On sample: $sample" obi ngsfilter -t ngsfile -u test_072022/unidentified_${sample} test_072022/${sample} test_072022/identified_${sample} done I am left with files w/ Line count: 0. Is this something you have encountered before with ngsfilter?

I apologize, there is now a completely different problem at hand!

midwesternmouse commented 2 years ago

Hi, just following up with this issue of Line count: 0 after assigning ngsfilter tags. I've modified my code so my imported files are kept as .fastq's, but I still have all my assigned ngsfilter files coming up as Line count:0.

bensutherland commented 2 years ago

Hi @midwesternmouse , Unfortunately, I am not able to help with this issue. This pipeline was built around an earlier version of OBITools (I believe it was OBITools 1, but I am confirming this now). It is not currently being actively maintained or developed with new versions of OBITools.

I did try to follow instructions here: https://git.metabarcoding.org/obitools/obitools3/-/wikis/Installing-the-OBITools3 ...to see if I could recover the error observed in your code with the demo data, but I was not successful even at installing the new OBITools.

Therefore, in this case, this issue would be best directed towards the developers of OBITools 3, who I believe will be much better suited to help you in this error or potential bug. My apologies that I cannot be of help.

I am going to add a flag to the pipeline README to indicate that it is only designed to work with OBITools 1, as well as ensure there is clear indication that it is not being actively maintained currently.

I'll leave the thread open for a bit just to be sure I'm not missing anything. I wish you good luck with your project!

Ben

bensutherland commented 2 years ago

Hi @midwesternmouse , I can confirm now that this pipeline was fully designed using OBITools v.1.2.11, and I suspect it would require some significant edits to make it functional with new OBITools. Ben

midwesternmouse commented 2 years ago

Hi @bensutherland, that makes a lot of sense, thank you for checking which pipeline this used. I have been trying to contact the developers of Obitools 3 & will continue my efforts there! Thank you for being so responsive, especially i/r/t old code! I do have a quick last question -- if this is old code, and you no longer use Obitools, what program / coding do you use now (if you still do metabarcoding)? Are there other programs you would suggest aside from Obitools?

bensutherland commented 2 years ago

Hi @midwesternmouse , That is a good plan, hopefully the developers are able to solve the issue. I am not actively doing metabarcoding at the moment, but if I were to start a new project, I would probably adapt the pipeline to OBITools3, try it with OBITools v.1, or there is a pipeline here developed by a very talented colleague of mine that I would give a try as well: https://github.com/enormandeau/barque I'll close this issue now, but please feel free to contact me by email if there is anything else I can help with. My email can be found here: http://benjgsutherland.com

Ben