N0taN3rd / wail

:whale2: One-Click User Instigated Preservation
http://matkelly.com/wail
GNU General Public License v3.0
120 stars 9 forks source link

Security pop-up on archiving tweets #71

Open weiglemc opened 7 years ago

weiglemc commented 7 years ago

Maybe this is a function of my Mac security settings, but every time a tweet is archived, I get the attached security pop-up.

It may be that all we need to do to address this is add some documentation on how to adjust the settings.

screen shot 2017-02-15 at 8 40 27 am

weiglemc commented 7 years ago

Here's the screenshot of my Security & Privacy System Prefs panel:

screen shot 2017-03-02 at 2 46 29 pm

machawk1 commented 7 years ago

I am getting asked this repeatedly, @N0taN3rd after allowing each time in version 1.1.0b2.5. This is happening even when WAIL is in the background and crawls are running. I presume there is some background procedure communication with pywb that is invoking the wayback binary. Any idea why the "allow" is not sticking? macOS 10.12.4

N0taN3rd commented 7 years ago

@machawk1 That is a really good question and one I have not been able to put my thumb on concretely. On the WAIL side all network requests are either made using the Node.js built in http/https libraries (when drilling down into the libraries used code) or Electrons Chromium when initiating the single page crawl. Libraries used by WAIL currently that make network requests:

So the request for network permissions for WAIL covers these two and eliminates it.

Heritrix makes many network requests but it is Java based and when launched is run by the 1.7 JVM. Only compilation done is the JIT of the class files in Heritrix's Jar by the JVM. So Heritrix plays nice due to the JVM and how Java applications are designed thus allowing for the network permissions to stick.

Pywb, i.e wayback binary. Now this guy is interesting because its usage depends on the output of pyinstaller and how it links the executibles to the packaged Python runtime. Pyinstaller packages all the .so/.dylib etc files that are used by the python version used by the compiling machine. From the pyinstaller docs

  1. First process: bootloader starts.

    1. If one-file mode, extract bundled files to temppath_MEIxxxxxx
    2. Modify various environment variables:
      • Linux: save original value of LD_LIBRARY_PATH into LD_LIBRARY_PATH_ORIG, prepend our path to LD_LIBRARY_PATH.
      • AIX: same thing, but using LIBPATH and LIBPATH_ORIG.
      • OSX: unset DYLD_LIBRARY_PATH.
    3. Set up to handle signals for both processes.
    4. Run the child process.
    5. Wait for the child process to finish.
    6. If one-file mode, delete temppath_MEIxxxxxx.

  2. Second process: bootloader itself started as a child process.

    1. On Windows set the activation context.
    2. Load the Python dynamic library. The name of the dynamic library is embedded in the executable file.
    3. Initialize Python interpreter: set sys.path, sys.prefix, sys.executable.
    4. Run python code.

Running Python code requires several steps:

  1. Run the Python initialization code which prepares everything for running the user’s main script. The initialization code can use only the Python built-in modules because the general import mechanism is not yet available. It sets up the Python import mechanism to load modules only from archives embedded in the executable. It also adds the attributes frozen and _MEIPASS to the sys built-in module.
  2. Execute any run-time hooks: first those specified by the user, then any standard ones.
  3. Install python “egg” files. When a module is part of a zip file (.egg), it has been bundled into the ./eggs directory. Installing means appending .egg file names to sys.path. Python automatically detects whether an item in sys.path is a zip file or a directory.
  4. Run the main script.

As you can see each time a pyinstaller "installed" program is run it sets up a unique python vm if you will. Now pywb uses wsgi server which does its own thing. This is where I believe the issue is originating from. Due to the internals of pywb and how pyinstaller boot-straps everything the "application/process" unique identifier for the permission is not found each time pywb is launched or wsgi internals spin up a new connection handler etc. WAIL repeatably restarts it due to its inability to dynamically know about new collections without re-starting.