Error with `Has URL` - Githubissues

nonprofittechy commented 2 years ago

possibly the if row[1]['Has URL']: needs to be changed to if row[1].get('Has URL'):

[Error](https://apps.chancerylaneproject.org/run/clause-search/#)

    [qsteenhuis@gmail.com](https://apps.chancerylaneproject.org/run/clause-search/#)

Error

[Retry](https://apps.chancerylaneproject.org/run/clause-search/#)

    KeyError: 'Has URL'

History
Tried to run mandatory code at 0.02838s

id: Main order block
mandatory: True
code: |
  snapshot_interview_state
  accept_gdpr_notice
  reconsider('snapshot_interview_state')
  prep_documents
  if all_clauses:
    review_before_download
  else:
    no_clauses_found
  if the_download_task.ready():
    if the_download_task.failed():
      show_failed_screen
    if the_assembly_task.ready():
      if the_assembly_task.failed():
        show_failed_screen
      reached_download_screen = True
      reconsider('snapshot_interview_state')
      download_filled
    else:
      waiting_screen
  else:
    waiting_screen

Needed definition of snapshot_interview_state at 0.05015s
Tried to run block at 0.05021s

code: |
  # Take the interesting fields and make them 2 dimensional so easier to view in XLSX
  stuff_to_snapshot = {
     'start_time': str(start_time().format('yyyy-MM-dd')),
     }
  # Get location
  try:
    import requests
    resp = requests.get(f"https://geolocation-db.com/json/{device(ip=True)}&position=true").json()
    stuff_to_snapshot['country'] = resp.get('country_code')
    stuff_to_snapshot['state'] = resp.get('state')
    stuff_to_snapshot['city'] = resp.get('city')
    stuff_to_snapshot['latitude'] = round(resp['latitude'] * 100) / 100 if 'latitude' in resp else None
    stuff_to_snapshot['longitude'] = round(resp['longitude'] * 100) / 100 if 'longitude' in resp else None
  except:
    stuff_to_snapshot['country'] = 'UNKNOWN'
  # Don't let DA parse nameerrors so that all data is recorded in one block,
  # regardless of how far in interview someone got
  try:
    stuff_to_snapshot['all_clauses'] = comma_list(all_clauses)
    stuff_to_snapshot['all_clauses_full_names'] = comma_list(f'"{clause.full_name}"' for clause in all_clause_objects)
    stuff_to_snapshot['selected_rows'] = comma_list([row[1]["Child's name"] for row in selected_rows.iterrows()])
    if defined('selected_vals'):
      for column in selected_vals:
        if selected_vals[column].any_true():
          stuff_to_snapshot[f"column_{column}"] = comma_list(selected_vals[column].true_values())
  except:
    pass

  if defined('email'):
    stuff_to_snapshot['email'] = email

  stuff_to_snapshot['reached_download_screen'] = defined('reached_download_screen')
  store_variables_snapshot(
      data=stuff_to_snapshot, persistent=True
     )
  snapshot_interview_state = True

Tried to run mandatory code at 0.19412s

id: Main order block
mandatory: True
code: |
  snapshot_interview_state
  accept_gdpr_notice
  reconsider('snapshot_interview_state')
  prep_documents
  if all_clauses:
    review_before_download
  else:
    no_clauses_found
  if the_download_task.ready():
    if the_download_task.failed():
      show_failed_screen
    if the_assembly_task.ready():
      if the_assembly_task.failed():
        show_failed_screen
      reached_download_screen = True
      reconsider('snapshot_interview_state')
      download_filled
    else:
      waiting_screen
  else:
    waiting_screen

Needed definition of prep_documents at 0.19416s
Tried to run block at 0.19422s

id: prep documents
code: |
  url_base_str = 'https://chancerylaneproject.org/climate-clauses/{}'
  gdrive_base_url = 'https://docs.google.com/document/d/{}'
  all_clause_ids = []
  all_clauses = []
  all_clause_objects = []
  selected_rows = multi_index.get_full_rows(row_ids)
  for row in selected_rows.iterrows():
    g_file = get_latest_file_for_clause(all_files, row[1]["Child's name"])
    if g_file:
      all_clauses.append(row[1]["Child's name"])
      modified_time = g_file.get('modifiedTime')
      if 'id' in g_file:
        all_clause_ids.append(g_file.get('id'))
      log(f'{row}')
      if row[1]['Has URL']:
        url = url_base_str.format(row[1]['URL arg'].lower().replace(' ', '-').replace('/', '-'))
      else:
        url = gdrive_base_url.format(g_file.get('id'))
      full_name = row[1]['Full name']
      all_clause_objects.append(DAObject(
        name=row[1]["Child's name"],
        full_name=full_name,
        modified_time=modified_time,
        url=url,
        docx_link=f'[{full_name}]({url})',
        file_id=g_file.get('id')
      ))
  prep_documents = True

Needed definition of row_ids at 0.19432s
Tried to run block at 0.19436s

id: select rows
code: |
  query_list = []
  for col_name in selected_vals.keys():
    if selected_vals[col_name].any_true():
      query_list.append([[col_name, selected_vals[col_name].true_values()]])
  true_nested_vals = {}
  for col_and_val, selected in combined_selected_vals.items():
    if selected:
      col, val = col_and_val.split(';;;')
      if col in true_nested_vals:
        true_nested_vals[col].append(val)
      else:
        true_nested_vals[col] = [val]
  query_list.append([(col_name, vals) for col_name, vals in true_nested_vals.items()])
  row_ids = multi_index.query(query_list)
  #del query_list

Tried to run block at 0.19453s

id: prep documents
code: |
  url_base_str = 'https://chancerylaneproject.org/climate-clauses/{}'
  gdrive_base_url = 'https://docs.google.com/document/d/{}'
  all_clause_ids = []
  all_clauses = []
  all_clause_objects = []
  selected_rows = multi_index.get_full_rows(row_ids)
  for row in selected_rows.iterrows():
    g_file = get_latest_file_for_clause(all_files, row[1]["Child's name"])
    if g_file:
      all_clauses.append(row[1]["Child's name"])
      modified_time = g_file.get('modifiedTime')
      if 'id' in g_file:
        all_clause_ids.append(g_file.get('id'))
      log(f'{row}')
      if row[1]['Has URL']:
        url = url_base_str.format(row[1]['URL arg'].lower().replace(' ', '-').replace('/', '-'))
      else:
        url = gdrive_base_url.format(g_file.get('id'))
      full_name = row[1]['Full name']
      all_clause_objects.append(DAObject(
        name=row[1]["Child's name"],
        full_name=full_name,
        modified_time=modified_time,
        url=url,
        docx_link=f'[{full_name}]({url})',
        file_id=g_file.get('id')
      ))
  prep_documents = True

Needed definition of all_files at 0.19593s
Tried to run block at 0.19597s

code: |
  full_clauses_folder_id = "1QNQG3ToIOJ5p3PjC6cPlKgz3nXK3OYJG" #ian's
  # "1YDT_u4AJMzwJKNcAH2naYHhycNd-iHvt" #original
  all_files = get_files_in_folder(folder_id=full_clauses_folder_id)

Tried to run block at 1.96513s

id: prep documents
code: |
  url_base_str = 'https://chancerylaneproject.org/climate-clauses/{}'
  gdrive_base_url = 'https://docs.google.com/document/d/{}'
  all_clause_ids = []
  all_clauses = []
  all_clause_objects = []
  selected_rows = multi_index.get_full_rows(row_ids)
  for row in selected_rows.iterrows():
    g_file = get_latest_file_for_clause(all_files, row[1]["Child's name"])
    if g_file:
      all_clauses.append(row[1]["Child's name"])
      modified_time = g_file.get('modifiedTime')
      if 'id' in g_file:
        all_clause_ids.append(g_file.get('id'))
      log(f'{row}')
      if row[1]['Has URL']:
        url = url_base_str.format(row[1]['URL arg'].lower().replace(' ', '-').replace('/', '-'))
      else:
        url = gdrive_base_url.format(g_file.get('id'))
      full_name = row[1]['Full name']
      all_clause_objects.append(DAObject(
        name=row[1]["Child's name"],
        full_name=full_name,
        modified_time=modified_time,
        url=url,
        docx_link=f'[{full_name}]({url})',
        file_id=g_file.get('id')
      ))
  prep_documents = True

[Show variables and values](https://apps.chancerylaneproject.org/vars?i=docassemble.tclpgoogledocsmerger%3Adata%2Fquestions%2Fsearch_and_download.yml)
Log

Traceback (most recent call last):
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/docassemble/base/parse.py", line 7777, in assemble
    exec_with_trap(question, user_dict)
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/docassemble/base/parse.py", line 9116, in exec_with_trap
    exec(the_question.compute, the_dict)
  File "<code block>", line 4, in <module>
NameError: name 'prep_documents' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Has URL'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/docassemble/webapp/server.py", line 7520, in index
    interview.assemble(user_dict, interview_status, old_user_dict, force_question=special_question)
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/docassemble/base/parse.py", line 8056, in assemble
    raise the_error
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/docassemble/base/parse.py", line 7845, in assemble
    question_result = self.askfor(missingVariable, user_dict, old_user_dict, interview_status, seeking=interview_status.seeking, follow_mc=follow_mc, seeking_question=seeking_question)
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/docassemble/base/parse.py", line 8504, in askfor
    exec_with_trap(question, user_dict)
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/docassemble/base/parse.py", line 9116, in exec_with_trap
    exec(the_question.compute, the_dict)
  File "<code block>", line 15, in <module>
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/pandas/core/series.py", line 853, in __getitem__
    return self._get_value(key)
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/pandas/core/series.py", line 961, in _get_value
    loc = self.index.get_loc(label)
  File "/usr/share/docassemble/local3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'Has URL'

BryceStevenWilley commented 2 years ago

What commit hash are you on? Both https://github.com/LemmaLegalConsulting/docassemble-tclpgoogledocsmerger/tree/redis and https://github.com/LemmaLegalConsulting/docassemble-tclpgoogledocsmerger/commit/92c9a33e9acec497541597b928935ad713263496 should fix that. I can't recreate with either of those.

nonprofittechy commented 2 years ago

It was the installed commit at apps.chancerylaneproject.org (I visited https://apps.chancerylaneproject.org/start/clause-search/)

I selected just "offsetting" as the only tag

BryceStevenWilley commented 2 years ago

I'm still baffled, because I can't reproduce that. I select offsetting and get back 15 clauses perfectly fine.

Can you include your variable dump?

nonprofittechy commented 2 years ago

using ?reset=1 cleared it up! Weird

nonprofittechy commented 2 years ago

Also--very fast! Still seeing the file corruption issue you highlighted, but the speedup is very impressive. We should see if any of that can be integrated upstream.

LemmaLegalConsulting / docassemble-tclpgoogledocsmerger

Error with `Has URL` #38