databrary / datavyu

Desktop video coding/annotation tool
http://datavyu.org/
GNU General Public License v3.0
26 stars 18 forks source link

Inconsistent Script Error #415

Open margaretshavlik opened 2 years ago

margaretshavlik commented 2 years ago

Hello!

I have multiple coded participant videos (as .opf files) that I want to export into a combined (longform) .csv file that I will analyze in R for my dissertation. I have the files organized by each book reading session that the participant completed (e.g., "LC1", "LC2", etc). I used an example script template from the repository on GitHub/Datavyu's website (adapting with the relevant information for my purposes), and have had success exporting data for one of the folders ("LC1"), but for the other ("LC2"), I receive the following error when it reaches certain (but not all) files within the folder:

"org.jruby.embed.EvalFailedException: java.lang.NullPointerException"

I can't figure out what this is in reference to, and I can 't find anything different about the 'offending' files compared to the others in the same folder (or why the issue is only coming up for files in this folder, when I am using the exact same code as for the other folder without issue). If I remove the offending file from the folder and try running the script again, a new file now leads to the same error (I have tried this several times).

Here is my ruby code (and I have attached the .opf files as well).

Thank you in advance!

#################### LC 1 ###################

## Parameters
input_folder = '/Users/margaret/Library/CloudStorage/Box-Box/DR_CODING/Back-Up_Files/LC-Session1_Back-Ups'
output_file = '/Users/Margaret/Library/CloudStorage/Box-Box/DR_CODING/Back-Up_Files/DataExporting/LC1_DR_export.csv'
code_map = {
  'id' => %w(dr session booktitle coder1 coder2 numkids),
  'coder1' => %w(ordinal onset offset speaker category subcategory),
  'notes_c1' => %w(note)}
static_columns = %w(id) #(these columns will have codes from first cell repeated for entire file)
nested_columns = %w()
sequential_columns = %w(coder1 notes_c1) #(cells from these columns will be exported as single rows for each code)
blank_value = '' # code to put in for missing cells
delimiter = ','

# Set to true to force a row to be printed for each innermost-nested cell.
# Default behavior is to skip nested cells that don't have any data for sequential cells.
ensure_rows_per_nested_cell = true

## Body
require 'Datavyu_API.rb'

data = []
# Header order is: static, nested, sequential
header = (static_columns + nested_columns + sequential_columns).map do |colname|
  code_map[colname].map{ |codename| "#{colname}_#{codename}" }
end
header.flatten!
data << header.join(delimiter)

# Init arrays of default values.
default_data = {}
code_map.each_pair{ |k, v| default_data[k] = [blank_value] * v.size }

input_path = File.expand_path(input_folder)
infiles = Dir.chdir(input_path){ Dir.glob('*.opf') }

infiles.each do |infile|
  $db, $pj = load_db(File.join(input_path, infile))

  puts "working on #{infile}..."

  columns = {}
  code_map.keys.each{ |x| columns[x] = get_column(x) }

  # Get static data from first cells.
  static_data = static_columns.map do |colname|
    col = columns[colname]
    cell = col.cells.first
    raise "Can't find cell in #{col}" if cell.nil? # static columns must contain cell

    cell.get_codes(code_map[colname])
  end
  static_data.flatten!

  # Iterate over cells of innermost-nested column.
  if(nested_columns.empty?)
    inner_data = []
    outer_data = []

    # Iterate over sequential columns.
    rows_added = 0
    sequential_columns.each do |scol|
      # Reset data hash so values are not carried over.
      seq_data = default_data.select{ |k, v| sequential_columns.include?(k) }

      # Iterate over sequential cells nested inside inner cell.
      seq_cells = columns[scol].cells
      seq_cells.each do |scell|
        seq_data[scol] = scell.get_codes(code_map[scol])

        row = static_data + outer_data + inner_data + seq_data.values.flatten
        data << row.join(delimiter)
        rows_added += 1
      end
    end
  else
    inner_col = nested_columns.last
    outer_cols = nested_columns[0..-2]
    columns[inner_col].cells.each do |icell|
      inner_data = icell.get_codes(code_map[inner_col])
      outer_data = outer_cols.map do |ocol|
        ocell = columns[ocol].cells.find{ |x| x.contains(icell) }
        raise "Can't find nesting cell in column #{ocol} for cell #{icell.ordinal} in column #{inner_col}." if ocell.nil?
        ocell.get_codes(code_map[ocol])
      end
      outer_data.flatten!

      # Init blank data hash so that data for this column is placed properly.
      seq_data = default_data.select{ |k, v| sequential_columns.include?(k) }

      # Iterate over sequential columns.
      rows_added = 0
      sequential_columns.each do |scol|
        # Reset data hash so values are not carried over.
        seq_data = default_data.select{ |k, v| sequential_columns.include?(k) }

        # Iterate over sequential cells nested inside inner cell.
        seq_cells = columns[scol].cells.select{ |x| icell.contains(x) }
        seq_cells.each do |scell|
          seq_data[scol] = scell.get_codes(code_map[scol])

          row = static_data + outer_data + inner_data + seq_data.values.flatten
          data << row.join(delimiter)
          rows_added += 1
        end
      end

      # Edge case for no nested sequential cell(s).
      if(rows_added == 0 && ensure_rows_per_nested_cell)
        row = static_data + outer_data + inner_data + seq_data.values.flatten
        data << row.join(delimiter)
        rows_added +=1
      end
    end
  end
end

puts "Writing data to file..."
outfile = File.open(File.expand_path(output_file), 'w+')
outfile.puts data
outfile.close

puts "Finished."

########################### LC 2 ######################
[LC-Session1_Back-Ups.zip](https://github.com/databrary/datavyu/files/8599812/LC-Session1_Back-Ups.zip)

## Parameters
input_folder = '/Users/margaret/Library/CloudStorage/Box-Box/DR_CODING/Back-Up_Files/LC-Session2_Back-Ups'
output_file = '/Users/margaret/Library/CloudStorage/Box-Box/DR_CODING/Back-Up_Files/DataExporting/LC2_DR_export.csv'

code_map = {
  'id' => %w(dr session booktitle coder1 coder2 numkids),
  'coder1' => %w(ordinal onset offset speaker category subcategory),
  'notes_c1' => %w(note)}
static_columns = %w(id) #(these columns will have codes from first cell repeated for entire file)
nested_columns = %w()
sequential_columns = %w(coder1 notes_c1) #(cells from these columns will be exported as single rows for each code)
blank_value = '' # code to put in for missing cells
delimiter = ','

# Set to true to force a row to be printed for each innermost-nested cell.
# Default behavior is to skip nested cells that don't have any data for sequential cells.
ensure_rows_per_nested_cell = true

## Body
require 'Datavyu_API.rb'

data = []
# Header order is: static, nested, sequential
header = (static_columns + nested_columns + sequential_columns).map do |colname|
  code_map[colname].map{ |codename| "#{colname}_#{codename}" }
end
header.flatten!
data << header.join(delimiter)

# Init arrays of default values.
default_data = {}
code_map.each_pair{ |k, v| default_data[k] = [blank_value] * v.size }

input_path = File.expand_path(input_folder)
infiles = Dir.chdir(input_path){ Dir.glob('*.opf') }

infiles.each do |infile|
  $db, $pj = load_db(File.join(input_path, infile))

  puts "working on #{infile}..."

  columns = {}
  code_map.keys.each{ |x| columns[x] = get_column(x) }

  # Get static data from first cells.
  static_data = static_columns.map do |colname|
    col = columns[colname]
    cell = col.cells.first
    raise "Can't find cell in #{col}" if cell.nil? # static columns must contain cell

    cell.get_codes(code_map[colname])
  end
  static_data.flatten!

  # Iterate over cells of innermost-nested column.
  if(nested_columns.empty?)
    inner_data = []
    outer_data = []

    # Iterate over sequential columns.
    rows_added = 0
    sequential_columns.each do |scol|
      # Reset data hash so values are not carried over.
      seq_data = default_data.select{ |k, v| sequential_columns.include?(k) }

      # Iterate over sequential cells nested inside inner cell.
      seq_cells = columns[scol].cells
      seq_cells.each do |scell|
        seq_data[scol] = scell.get_codes(code_map[scol])

        row = static_data + outer_data + inner_data + seq_data.values.flatten
        data << row.join(delimiter)
        rows_added += 1
      end
    end
  else
    inner_col = nested_columns.last
    outer_cols = nested_columns[0..-2]
    columns[inner_col].cells.each do |icell|
      inner_data = icell.get_codes(code_map[inner_col])
      outer_data = outer_cols.map do |ocol|
        ocell = columns[ocol].cells.find{ |x| x.contains(icell) }
        raise "Can't find nesting cell in column #{ocol} for cell #{icell.ordinal} in column #{inner_col}." if ocell.nil?
        ocell.get_codes(code_map[ocol])
      end
      outer_data.flatten!

      # Init blank data hash so that data for this column is placed properly.
      seq_data = default_data.select{ |k, v| sequential_columns.include?(k) }

      # Iterate over sequential columns.
      rows_added = 0
      sequential_columns.each do |scol|
        # Reset data hash so values are not carried over.
        seq_data = default_data.select{ |k, v| sequential_columns.include?(k) }

        # Iterate over sequential cells nested inside inner cell.
        seq_cells = columns[scol].cells.select{ |x| icell.contains(x) }
        seq_cells.each do |scell|
          seq_data[scol] = scell.get_codes(code_map[scol])

          row = static_data + outer_data + inner_data + seq_data.values.flatten
          data << row.join(delimiter)
          rows_added += 1
        end
      end

      # Edge case for no nested sequential cell(s).
      if(rows_added == 0 && ensure_rows_per_nested_cell)
        row = static_data + outer_data + inner_data + seq_data.values.flatten
        data << row.join(delimiter)
        rows_added +=1
      end
    end
  end
end

puts "Writing data to file..."
outfile = File.open(File.expand_path(output_file), 'w+')
outfile.puts data
outfile.close

puts "Finished."

LC-Session1_Back-Ups.zip LC-Session2_Back-Ups.zip