AmpSeqR with demultiplexed data

kawu001 commented 8 months ago

Hi, Thanks for the releasing a good package. I have the following issues;

I have an already demultiplexed data, with more than 40 markers (SSR and SNPs). From the example used in the AmpSeqR paper, I tried following the SARS-COVID examples, but I didn't understand where and how get get the demultiplexed.rds file you read into the package.
I ran this code # Filter and trim reads flt_reads <- demultiplexed %>% dada_filter(output_dir = run_dir, output_sub_dir = file.path(run_dir, 'filter')) and I got this error. Error in dada2::filterAndTrim(fwd = reads_1, filt = reads_1_out, rev = reads_2, : All output files must be distinct.

How do you think I can solve these issues?

Thanks.

jemunro commented 8 months ago

Hi kawu001,

If your reads are already demultiplexed, you can create a table with the same format as that in vignette, with columns as follows: sample_id: unique sample identifier marker_id: unique amplicon/marker identifier reads_1: path to demultiplex and trimmed forward reads in FASTQ format reads_2: path to demultiplex and trimmed reverse reads in FASTQ format n: number of reads sample: optional sample name (can be NA if not used) info: optional sample metadata (can be NA if not used) You can create this as a csv file and read it in with demultiplexed <- read_csv('my_file.csv')
This error indicates that there are duplicated paths in the reads_1 or reads_2 columns - every row should contain two distinct FASTQ file paths for the forward and reverse reads.

Hope that helps.

kawu001 commented 7 months ago

Hi Jemunro,

Thanks, it worked fine and was able to run haplotype filtering. However, I encountered another error.

seq_flt_tbl <- sequence_filter(seq_ann_tbl = seq_ann_tbl,

sample_manifest = sample_manifest,

marker_info = marker_info,

output_dir = run_dir,

vcf_output_dir = file.path(run_dir, 'vcf'),

max_sm_miss = 1,

max_marker_miss = 1,

min_homo_rep = NULL,

terminal_region_len = NULL

) Error in auto_copy(): ! x and y must share the same src. ℹ x is a <tbl_df/tbl/data.frame> object. ℹ y is NULL. ℹ Set copy = TRUE if y can be copied to the same source as x (may be slow). Run rlang::last_trace() to see where the error occurred. rlang::last_trace() <error/rlang_error> Error in auto_copy(): ! x and y must share the same src. ℹ x is a <tbl_df/tbl/data.frame> object. ℹ y is NULL. ℹ Set copy = TRUE if y can be copied to the same source as x (may be slow).

Backtrace: ▆

├─AmpSeqR::sequence_filter(...)

│ ├─base::suppressMessages(...)

│ │ └─base::withCallingHandlers(...)

│ └─seq_tbl_sameRef %>% ...

├─dplyr::left_join(...)

└─dplyr:::left_join.data.frame(...)

└─dplyr::auto_copy(x, y, copy = copy) Run rlang::last_trace(drop = FALSE) to see 1 hidden frame. rlang::last_trace(drop = FALSE) <error/rlang_error> Error in auto_copy(): ! x and y must share the same src. ℹ x is a <tbl_df/tbl/data.frame> object. ℹ y is NULL. ℹ Set copy = TRUE if y can be copied to the same source as x (may be slow).

Backtrace: ▆

├─AmpSeqR::sequence_filter(...)

│ ├─base::suppressMessages(...)

│ │ └─base::withCallingHandlers(...)

│ └─seq_tbl_sameRef %>% ...

├─dplyr::left_join(...)

└─dplyr:::left_join.data.frame(...)

└─dplyr::auto_copy(x, y, copy = copy)

└─rlang::abort(bullets).

Kindly look into this, Thanks.

kawu001 commented 7 months ago

This is where I think something might be wrong.

In the haplotype filtering process, the input file = seq_ann_tbl, in this table, the last two columns (ident, and ident_z) are numbers. But in my analysis, the result from the columns is one (ident) number and the other (ident_z) is NA. I think there should be a way to proceed with the NA. If the two are numbers, the haplotype filtering will run. BBT_NA_file

bahlolab / AmpSeqR

AmpSeqR with demultiplexed data #6