JRaviLab / MolEvolvR

An R Package for characterizing proteins using molecular evolution and phylogeny
https://jravilab.github.io/MolEvolvR/
Other
6 stars 16 forks source link

Refactor Example Data: R-CMD check error #118

Open the-mayer opened 3 weeks ago

the-mayer commented 3 weeks ago

An R-CMD check error occurs when running this example code for reverseOperonSeq():

❯ checking examples ... [14s/15s] ERROR
  Running examples in ‘MolEvolvR-Ex.R’ failed
  The error most likely occurred in:

  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: reverseOperonSeq
  > ### Title: reverseOperon: Reverse the Direction of Operons in Genomic
  > ###   ContextSeq
  > ### Aliases: reverseOperonSeq
  > 
  > ### ** Examples
  > 
  > # Example genomic context data frame
  > prot <- data.frame(GenContext = c("A>B", "C<D", "E=F*G", "H>I"))
  > reversed_prot <- reverseOperonSeq(prot)
  Error in ge[[x]] : subscript out of bounds
  Calls: reverseOperonSeq -> lapply -> FUN -> straightenOperonSeq
  Execution halted

Using the example data defined in prot the error occurs during the lapply operation @line 137 as ge has length 0 due to the previous subset operation.

_Originally posted by @the-mayer in https://github.com/JRaviLab/MolEvolvR/pull/97#discussion_r1819485400_

Joiejoie1 commented 2 weeks ago

@jananiravi @the-mayer Can I be assigned to this issue? I want to give it a try.

jananiravi commented 2 weeks ago

Sure, @Joiejoie1!

Joiejoie1 commented 2 weeks ago

Sure, @Joiejoie1!

Thanks @jananiravi

Joiejoie1 commented 2 weeks ago

@jananiravi @the-mayer This is how I intend to Refactor Example Data: R-CMD check error

118

  1. Inspect the Code to Understand the Logic:

reverseOperonSeq processes a data frame with a GenContext column, splits it and manipulates genomic context strings. straightenOperonSeq is designed to annotate elements with directional indicators based on certain rules.

  1. Debugging the ge List Initialization:

The error message (subscript out of bounds) suggests that ge is empty or not structured as expected at the lapply() call. Check where ge is assigned in the code to see if it could be empty. You’ll see that ge is created from te[witheq], where witheq is derived from te.

  1. Print Statements to Check Intermediate Variables:

Insert print() statements before the problematic line (line 137) to output the contents of ge, te, and witheq: print(te) # Check the contents of te before filtering print(witheq) # Verify which elements of te have "=" print(ge) # Ensure ge has the expected structure and elements

This will allow to see if ge is empty or incorrectly structured before it’s processed with lapply.

  1. Run the Code in Chunks:

Execute the code in sections (or line by line) up to line 137 to isolate where ge might become empty or miss elements.

  1. Test with Known Example Data:

Define example input data and run it with reverseOperonSeq() to observe how it handles the input and where it fails. Use the example from the documentation: prot <- data.frame(GenContext = c("A>B", "C<D", "E=F*G", "H>I")) reversed_prot <- reverseOperonSeq(prot) This test will help determine if the error is consistent with the sample input, and you can check how the function processes each step.

  1. Check for Edge Cases:

Consider if certain patterns in the GenContext column could lead to "ge" becoming empty, such as missing certain characters or symbols. Modify the input data to include different patterns or minimal cases (like c("A>B")) to see if the function consistently produces an output.

  1. Add Conditionals to Handle Empty Cases:

If identified that ge can be empty, modify the code to handle this case before applying lapply():

if (length(ge) > 0) { ge <- lapply(1:length(ge), function(x) straightenOperonSeq(ge[[x]])) } else { warning("No elements to process in ge; skipping lapply operation.") } This condition ensures that "lapply" only runs if ge contains elements.

  1. Save and Rerun the Script:

Save the modified file and run the code from start to finish to confirm that it executes without errors.

  1. Verify Using R CMD check:

If the function works as expected, run devtools::check() to ensure the package passes R CMD check without errors.

Joiejoie1 commented 1 week ago

@jananiravi @the-mayer I have created a PR to this issue.