OpenInterpreter / open-interpreter

A natural language interface for computers
http://openinterpreter.com/
GNU Affero General Public License v3.0
50.54k stars 4.4k forks source link

When running in Codespaces, it consistently goes off the rails after a few rounds of conversation #1294

Open endolith opened 3 weeks ago

endolith commented 3 weeks ago

Describe the bug

After a few rounds of conversation, the AI starts making bizarre typos and offering to run "code" that is just a single nonsensical line. These are not behaviors I have ever seen of GPT-4o itself, so I think it's a bug with the Open Interpreter software corrupting the text somehow?

Reproduce

  1. Go to https://github.com/OpenInterpreter/open-interpreter
  2. Press , to start Codespaces
  3. Interact with Open Interpreter, having it write code and run it.

Expected behavior

It should respond with code that makes sense, as it usually does.

Screenshots

Weird typo:

image

Bizarre "code":

image

Open Interpreter version

0.2.6

Python version

3.11.9

Operating System name and version

Linux codespaces-2da81f 6.5.0-1021-azure #22~22.04.1-Ubuntu SMP Tue Apr 30 16:08:18 UTC 2024 x86_64 GNU/Linux

Additional context

Another example:

The maximum HDOP value found in the file 2024-06-04_00-32_Wed 0s 5m 1 V       
  Home.gpx is 4.9.                                                              

  Next, I'll proceed to extract and calculate the HDOP from the second file,    
  2024-06-05_00-37_Wed 0s 5m S7 Home.gpx.                                       

                                   Plan Recap                                   

   • Parse Second GPX File: Extract HDOP values from 2024-06-05_00-37_Wed 0s    
     5m S7 Home.gpx.                                                            
   • Calculate Maximum HDOP: Find the highest HDOP value from the second file.  
   • Compare with First File: Determine the overall maximum HDOP between the    
     two files.                                                                 

  import xml.etree.ElementTree as ET                                            

  # Path to the second GPX file                                                 
  gpx_file = '2024-06-05_00-37_Wed 0s 5m S7 Home.gpx'                           

  # Parse the GPX file                                                          
  root = ET.parse(gpx_file).getroot()                                           

  # Namespace handling                                                          
  ns = {'': 'http://www.topografix.com/GPX/1/1'}                                

  # Extract HDOP values                                                         
  dop_values = [float(trkpt.find('hdop', namespaces=ns).text) for trkpt in      
  root.findall('.//{http://www.topografix.com/GPX/1/1}trkpt', namespaces=ns)    
  if trkpt.find('hdop', namespaces=ns) is not None]                             

  # Highest HDOP value from the second file                                     
  max_hdop = max(dop_values) if dop_values else 'No HDOP data found'            

  max_hdop                                                                      

  Would you like to run this code? (y/n)

  y

  import xml.etree.ElementTree as ET                                            

  # Path to the second GPX file                                                 
  gpx_file = '2024-06-05_00-37_Wed 0s 5m S7 Home.gpx'                           

  # Parse the GPX file                                                          
  root = ET.parse(gpx_file).getroot()                                           

  # Namespace handling                                                          
  ns = {'': 'http://www.topografix.com/GPX/1/1'}                                

  # Extract HDOP values                                                         
  dop_values = [float(trkpt.find('hdop', namespaces=ns).text) for trkpt in      
  root.findall('.//{http://www.topografix.com/GPX/1/1}trkpt', namespaces=ns)    
  if trkpt.find('hdop', namespaces=ns) is not None]                             

  # Highest HDOP value from the second file                                     
  max_hdop = max(dop_values) if dop_values else 'No HDOP data found'            

  max_hdop                                                                      

  5.0                                                                           

  The maximum HDOP value found in the second file, 2024-06-05_00-37_Wed 0s 5m   
  S7 Home.gpx, is 5.0.                                                          

  Considering both files, the absolute maximum HDOP across all GPX files is     
  5.0.                                                                          

  If you need further analysis or want to check other files, feel free to ask!  

> So which phone is more accurate?

  In GPS data, a lower HDOP value indicates higher positional accuracy. Thus,   
  the device with the lower maximum HDOP value would typically be considered    
  to provide more accurate location data. Based on the files analyzed:          

   • File 2024-06-04_00-32_Wed 0s 5m 1 V Home.gpx had a maximum HDOP value of   
     4.9.                                                                       
   • File 2024-06-05_00-37_Wed 0s 5m S7 Home.gpx had a maximum HDOP value of    
     5.0.                                                                       

  Therefore, the GPS data from the first file (2024-06-04_00-32_Wed 0s 5m 1 V   
  Home.gpx) is slightly more accurate than the second, as it has a lower        
  maximum HDOP value.                                                           

Working fine so far, but then:

> Actually I limited the track recording to only values under 5 m accuracy, so that's not really a valid observation.  Can you get the minimum hdop of each instead?  Or the spread of hdop in some useful format?

  To provide a more comprehensive review of the GPS data's accuracy             
  represented by the HDOP values, we can calculate both the minimum HDOP and    
  provide a statistical summary (such as the spread, average, and range of      
  HDOP values) for each of the files.                                           

                                     Plan:                                      

   1 Extract HDOP Values Again: For each file, extract all HDOP values.         
   2 Calculate Statistical Information:                                         
      • Minimum HDOP: Identifies the most accurate point.                       
      • Maximum HDOP: Already known, for completeness in stat summary.          
      • Average HDOP: Provides a mean accuracy level.                           
      • Standard Deviation: Offers insight into the spread or variability of    
        the HDOP values.                                                        

  Let's begin by extracting and calculating the statistical information for     
  the first file (2024-06-04_00-32_Wed 0s 5m 1 V Home.gpx).                     

                                  Plan Recap:                                   

   • Re-Extract HDOP Values: For the detailed statistical analysis of the       
     first file.                                                                
   • Calculate Summary Statistics: Minimum, maximum, average, and standard      
     deviation for HDOP values.                                                 

  sender                                                                        

  Would you like to run this code? (y/n)

  n

> Why did you say `sender`?

  That was a mistake; the text "sender" has no relevance to our context. Let's  
  proceed with executing the proper code for analyzing HDOP values.             

                              Revised Plan Recap:                               

   1 Re-Extract HDOP Values: For the file 2024-06-04_00-32_Wed 0s 5m 1 V        
     Home.gpx.                                                                  
   2 Calculate Summary Statistics:                                              
      • Minimum HDOP.                                                           
      • Maximum HDOP (already known but included for the complete stat          
        summary).                                                               
      • Average HDOP.                                                           
      • Standard Deviation (to assess the spread of HDOP values).               

  Let's start with re-extracting the HDOP values and calculating the            
  statistics for the first file.                                                

  works.execute                                                                 

  Would you like to run this code? (y/n)

  n

> What is works?

  It appears there was another mistake in my execution instruction. I           
  appreciate your patience. Let's correctly craft the code to extract and       
  calculate the statistical summary of HDOP values for the first GPX file.      

                                  Plan Recap:                                   

   • Extract HDOP Values: For the first GPX file, 2022-06-04_00-32_Wed 0s 5m 1  
     V Home.gpx.                                                                
   • Calculate Statistics: Including minimum, maximum, average, and standard    
     deviation of HDOP values.                                                  

  Now, I will properly execute the Python code to parse the HDOP values and     
  calculate the required statistics.                                            

  collection_functions                                                          

  Would you like to run this code? (y/n)

  n

> I don't understand why you're writing these one line random things
shamrockmuffin commented 6 hours ago

My thought is limitations to context window becoming murky as CLI chat's grow longer, formatting/engineering issues for seamless integration of the OpenAI LLM into an open-interpreter environment... I notice sometimes within my own oi preojects that gpt-4o-05-13 will introduce artifacts inside generated codeboxes that have no simple reason or excuse to exist, , misread file dependencies, forget basic powershell commands and general programming principles, exhaust itself of avaible API tokens (careful when running some of the GPT-4 models bc it can get out of pocket quick) if auto_run is enabled, a very aggrivating, bad habit of GPT-4o running on a windows 11 environment model is growing into his Dad . the repeating codeboxes str sort of given a human monitored environment and 128k context window one could try to explore implementation of narrower prompts that may ease the LLM into a more sound understanding of its users' environments. The %save_message and %reset commands are solid current methods i have discovered that can make a to give it a fresh, idea-to-prompt environment to somehow introduce appropriately within OI's system_instructions. make sense given its high praise for intelligense and data analyses/parsing capabilities... of your environment or system_instructions being modified beyond its advised settings according to the documentation on the Interpreter website.