AlistairCurd / PERPL-Python3

Apache License 2.0
6 stars 3 forks source link

PERPL (Pattern Extraction from Relative Positions of Localisations)

This project provides functions for finding relative positions between points in 3D space and plotting as distance histograms for single molecule localisation microscopy data e.g. direct stochastic optical reconstruction microscopy (dSTORM) or photoactivated light microscopy (PALM). It also provides functions for fitting in silico model relative position distributions to those obtained from experimental localisations and allows the user to select the most likely structural model to describe the experimental data.

The software uses a file containing localisation data, analyses the distribution of relative positions between them within a certain maximum distance ('filter distance') and outputs and saves these relative positions. The filter distance is applied in 3D (or in 2D, as required). The software compares these outputs to relative position distributions obtained from synthetic localisation data models based on hypotheses of structural features. It outputs model fits and relative likelihoods for these models.

The algorithms were developed by Alistair Curd of the University of Leeds, from 30 July 2018.

Copyright 2018 Peckham Lab

It was ported from Python 2 to Python 3 and made more user friendly by Joanna Leng at the University of Leeds who was funded by EPSRC as a Research Software Engineering Fellow (EP/R025819/1).

History before 2022-12-20: On 2022-12-20 this project was migrated from Bitbucket (https://bitbucket.org/apcurd/perpl-python3/). We were unable to port the whole history because it included files that were too large for Github. The current repository (https://github.com/AlistairCurd/PERPL-Python3) holds development of the master branch from that point.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

DEVELOPED WITH:

The latest version was developed using Python 3.11 and Mamba on a Windows 10 system, and with Windows Subsystem for Linux 2 (WSL2) with Ubuntu 22.04.3 LTS. Previous versions have also been developed on a Centos system and exectuted a limited number of times on an Apple Mac.

QUICK START:

Prerequisites

You will need to be able to create a Python 3.11 environment, e.g. with Anaconda or Miniconda or Mamba. We recommend Mamba).

Installation

  1. Create a Python 3.11 environment:

    e.g., in Miniforge (see here):

    mamba create -n perpl python=3.11

  2. Activate the environment:

    e.g., mamba activate perpl

  3. Install perpl:

    • To use without downloading this repository: pip install perpl
    • To use notebooks or develop:
      1. Clone or download this repository
      2. Navigate to your copy of this repository, then pip install .

Run scripts

Type relpos to execute relative_positions.py. (If you run into admin rights issues on Windows, use WSL2 or navigate to src/perpl and type python relative_positions.py instead.)

This analyses localisation data that has been processed into X and Y (2D) or X, Y and Z (3D) coordinates stored in a text or .csv file. You will be asked to provide input, such as the input data filename, to the script as it executes. Example data files can be found as described in the DATA section. A relatively small data file useful for testing the software is Nup107_3D_10000_from_36297_locs.csv, choose 3D analysis and a filter distance of 200nm. We plan to upload the data to Zenodo and include instructions here on how to access them.

Type rotsym2d to execute rot_2d_symm_fit.py. (Similar to above, you can also use python rot_symm_fit.py.)

This will read in output data from the relative_positions.py script and compare it to a model of localisations with 2D rotational symmetry. Again you will be asked to provide input, which should be a list of relative positions, not a list of localisations. A good test file for this script is Nup107_SNAP_3D_GRROUPED_10nmZprec_PERPL-relpos_200.0filter.csv, which itself was generated by running relative_positions.py on Nup107_SNAP_3D_GRROUPED_10nmZprec.txt.

NB These scripts use meaningful filenames and directories to store the results. This sometimes creats long paths which Windows in particular finds difficult to handle. If this happens use the -s flag at the ends of these commands to switch to a shorter naming convention.

USAGE:

There are two Python scripts that can be executed from the command line from a shell with Python 3 available, these are relative_positions.py and rot_2d_symm_fit.py.

Each of these can be run from the command line in two ways:

  1. They can be executed and all the necessary parameters for the code to execute successfully can be provided as flags/arguments at the command line.
  2. They can be executed from the command line where no flags/arguments are provided. In this case the user is asked to provide the necessary information as the script executes via a file browser for the input file and via the command line for all other information.

In both these cases the user can choose to execute the code in verbose mode and monitor its progress via output printed to the command line (standard output).

Each time these scripts run, a HTML report is created in the directory where the input data is stored. The paths and filenames provide information on when the script executed and the parameters selected, as well as being documented in the report. The image files used in the report are saved to the directory with the report. You can view the report in a web browser on the machine where it was created by double clicking on the HTML file in the directory. If you wish to share the file, you can print it (or save it as a pdf if correctly configured) via your web browser. If you wish to share the HTML report as HTML, remember to share the image files with the HTML file.

There are also Jupyter notebooks for fitting models to the data and producing plots and numerical results. These may be run interactively, and modified, e.g. for path to the input data and choice of model to fit to the input data. These may be found and run in a web browser by typing jupyter notebook in the command prompt.

relative_positions.py

To execute interactively, provide no flags (arguments) and type:

relpos (or see above for rights issues on Windows)

To execute silently with default values, all you need is to include a data file; type:

relpos -i data_file.csv

To execute silently with your own values you also need to include a data file; type for example:

relpos -i data_file.csv -d 2 -f 200 -z 12 -s

To get information on the flags and usage, type:

relpos -h

This script can take several minutes to run, depending on the size (number of localisations) and density of the input data.

rot_2d_symm_fit.py

This script is executed to compare a model with rotational 2D symmetry with experimental fluorescence localisation microscopy data. The script reads in output data generated by relative_positions.py and compares it to a model that it generates.

To execute interactively, provide no flags (arguments) and type:

rotsym2d

To execute silently with default values, all you need is to include a data file; type:

rotsym2d -i data_file_output_from_relative_positions.csv

To execute silently with your own values, you need to include a data file and filter distance (the filter distance only has an effect if it is less than that used in generating the input data when executing relative_positions.py); type for example:

rotsym2d -i data_file_output_from_relative_positions.csv -f 100

To get information on the flags and usage, type:

rotsym2d -h

Examples of usage are in the bash script command_line_demo.sh. This is a Linux script and will not run on Windows in an Anaconda shell. If you transfer this script to a Linux system you may need to run the dos2unix command on it to make it work, as well as chmod u+x command_line_demo.sh. If you create the data-perpl directory as described in the DATA section then the script should pick up the data without you having to change the path in the script.

Jupyter notebooks (.ipynb)

To start the Jupyter notebook environment, go to an Anaconda shell that is running the PERPL environment and type:

jupyter notebook

The notebook environment will open in your web browser. Select a notebook (.ipynb file). Select Python 3 for the kernel, if necessary. If working in WSL2:

NB Each notebook requires you to load data into it (see DATA section).

DOCUMENTATION

This was formerly generated with Doxygen. It needs to be regenerated.

DATA

Test data for this software, or examples with which the software can be used, can be found at https://bitbucket.org/apcurd/perpl_test_data. These files are:

The shell script command_line_demo.sh and the Jupyter notebooks will find these files and load the required data from relative paths. This will work if the data is placed in a directory called data-perpl, which is within the same parent directory (perpl-home, or a name of your choice) as the PERPL software directory (perpl-python3). A schematic of this directory structure is given below.

You can also download the data into any directory on your system and edit the file name and path in the relevant script/notebook, if you prefer.

FILES INCLUDED:

Python Code

In src/perpl:

Jupyter notebooks

Unittests

There are unit tests in the tests directory. These will be of interest to a software engineer who wishes to extend this project. They can be run from a Python 3 shell with the command.

python -m unittest discover -s tests