bahlolab / AmpSeqR

Multi-locus Sequence Type Identificatino from Multiplexed Amplicon Deep Sequencing
GNU General Public License v3.0
3 stars 1 forks source link

AmpSeqR with demultiplexed data #6

Open kawu001 opened 8 months ago

kawu001 commented 8 months ago

Hi, Thanks for the releasing a good package. I have the following issues;

  1. I have an already demultiplexed data, with more than 40 markers (SSR and SNPs). From the example used in the AmpSeqR paper, I tried following the SARS-COVID examples, but I didn't understand where and how get get the demultiplexed.rds file you read into the package.
  2. I ran this code # Filter and trim reads flt_reads <- demultiplexed %>%   dada_filter(output_dir = run_dir,          output_sub_dir = file.path(run_dir, 'filter')) and I got this error. Error in dada2::filterAndTrim(fwd = reads_1, filt = reads_1_out, rev = reads_2, : All output files must be distinct.

How do you think I can solve these issues?

Thanks.

jemunro commented 8 months ago

Hi kawu001,

  1. If your reads are already demultiplexed, you can create a table with the same format as that in vignette, with columns as follows: sample_id: unique sample identifier marker_id: unique amplicon/marker identifier reads_1: path to demultiplex and trimmed forward reads in FASTQ format reads_2: path to demultiplex and trimmed reverse reads in FASTQ format n: number of reads sample: optional sample name (can be NA if not used) info: optional sample metadata (can be NA if not used) You can create this as a csv file and read it in with demultiplexed <- read_csv('my_file.csv')
  2. This error indicates that there are duplicated paths in the reads_1 or reads_2 columns - every row should contain two distinct FASTQ file paths for the forward and reverse reads.

Hope that helps.

kawu001 commented 7 months ago

Hi Jemunro,

Thanks, it worked fine and was able to run haplotype filtering. However, I encountered another error.

seq_flt_tbl <- sequence_filter(seq_ann_tbl = seq_ann_tbl,

  • sample_manifest = sample_manifest,
  • marker_info = marker_info,
  • output_dir = run_dir,
  • vcf_output_dir = file.path(run_dir, 'vcf'),
  • max_sm_miss = 1,
  • max_marker_miss = 1,
  • min_homo_rep = NULL,
  • terminal_region_len = NULL
  • ) Error in auto_copy(): ! x and y must share the same src. ℹ x is a <tbl_df/tbl/data.frame> object. ℹ y is NULL. ℹ Set copy = TRUE if y can be copied to the same source as x (may be slow). Run rlang::last_trace() to see where the error occurred. rlang::last_trace() <error/rlang_error> Error in auto_copy(): ! x and y must share the same src. ℹ x is a <tbl_df/tbl/data.frame> object. ℹ y is NULL. ℹ Set copy = TRUE if y can be copied to the same source as x (may be slow).

    Backtrace: ▆

    1. ├─AmpSeqR::sequence_filter(...)
    2. │ ├─base::suppressMessages(...)
    3. │ │ └─base::withCallingHandlers(...)
    4. │ └─seq_tbl_sameRef %>% ...
    5. ├─dplyr::left_join(...)
    6. └─dplyr:::left_join.data.frame(...)
    7. └─dplyr::auto_copy(x, y, copy = copy) Run rlang::last_trace(drop = FALSE) to see 1 hidden frame. rlang::last_trace(drop = FALSE) <error/rlang_error> Error in auto_copy(): ! x and y must share the same src. ℹ x is a <tbl_df/tbl/data.frame> object. ℹ y is NULL. ℹ Set copy = TRUE if y can be copied to the same source as x (may be slow).

      Backtrace: ▆

    8. ├─AmpSeqR::sequence_filter(...)
    9. │ ├─base::suppressMessages(...)
    10. │ │ └─base::withCallingHandlers(...)
    11. │ └─seq_tbl_sameRef %>% ...
    12. ├─dplyr::left_join(...)
    13. └─dplyr:::left_join.data.frame(...)
    14. └─dplyr::auto_copy(x, y, copy = copy)
    15. └─rlang::abort(bullets).

Kindly look into this, Thanks.

kawu001 commented 7 months ago

This is where I think something might be wrong.

In the haplotype filtering process, the input file = seq_ann_tbl, in this table, the last two columns (ident, and ident_z) are numbers. But in my analysis, the result from the columns is one (ident) number and the other (ident_z) is NA. I think there should be a way to proceed with the NA. If the two are numbers, the haplotype filtering will run. BBT_NA_file