ReproNim / reproman

ReproMan (AKA NICEMAN, AKA ReproNim TRD3)
https://reproman.readthedocs.io
Other
24 stars 14 forks source link

ReproZip: Extend information about Debian package origins (repositories etc) #8

Closed yarikoptic closed 7 years ago

yarikoptic commented 8 years ago

So we have sufficient information to reproduce any given environment later on

yarikoptic commented 8 years ago

commands which could be used:

E.g.

$> apt-cache policy libc6-dev                                                                              
libc6-dev:   
  Installed: 2.19-18+deb8u4
  Candidate: 2.19-18+deb8u4
  Version table:
 *** 2.19-18+deb8u4 0
        500 http://debian.csail.mit.edu/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
     2.19-18+deb8u3 0
        500 http://security.debian.org/ jessie/updates/main amd64 Packages
$> apt-cache showpkg libc6-dev | head 
Package: libc6-dev
Versions: 
2.19-18+deb8u4 (/var/lib/apt/lists/debian.csail.mit.edu_debian_dists_jessie_main_binary-amd64_Packages) (/var/lib/dpkg/status)
 Description Language: 
                 File: /var/lib/apt/lists/debian.csail.mit.edu_debian_dists_jessie_main_binary-amd64_Packages
                  MD5: 1bbdc717d9acdb44db940928d570e749
 Description Language: en
                 File: /var/lib/apt/lists/debian.csail.mit.edu_debian_dists_jessie_main_i18n_Translation-en
                  MD5: 1bbdc717d9acdb44db940928d570e749
$> head /var/lib/apt/lists/debian.csail.mit.edu_debian_dists_jessie_Release                   
Origin: Debian
Label: Debian
Suite: stable
Version: 8.5
Codename: jessie
Date: Sat, 04 Jun 2016 13:24:54 UTC
Architectures: amd64 arm64 armel armhf i386 mips mipsel powerpc ppc64el s390x
Components: main contrib non-free
Description: Debian 8.5 Released 04 June 2016
MD5Sum:

so we generate

distributions:
 - name: debian-1
    origin: Debian
    label: Debian
    suite: stable
    version: 8.5
    codename: jessie
    date: Sat, 04 Jun 2016 13:24:54 UTC
    components: main contrib non-free
    architectures: amd64

packages:
 - name: libc6-dev
    version: 2.19-18+deb8u4   # from apt-cache policy
    architecture: amd64        # as identified from /var/..._<arch=amd64>_Packages filename
    distribution: debian-1
    suite: main        # as identified from /var/..._<suite=main>_binary-<arch>.Packages

I think this should be sufficient information to then later on to identify an apt repository(ies) (from archives.debian.org or snapshots.debian.org) which would be providing this particular package.

In Python implementation deb822 module could provide many useful helpers to read those Release and possibly other files.

*In [6]: deb822.Release(codecs.open('/var/lib/apt/lists/debian.csail.mit.edu_debian_dists_jessie_Release', 'r', 'utf-8')).keys()
Out[6]: 
['Origin',
 'Label',
 'Suite',
 'Version',
 'Codename',
 'Date',
 'Architectures',
 'Components',
 'Description',
 'MD5Sum',
 'SHA1',
 'SHA256']

and even possible to extract directly the version of installed package from 'status' file:

*In [19]: [p for p in deb822.Packages.iter_paragraphs(codecs.open('/var/lib/dpkg/status', 'r', 'utf-8')) if p['Package'] == 'libc6-dev'][0]['Version'] 

not sure how yet possible via this pythonic way to link to the '_Packages' file to identify the (In)Release file (I think that showpkg just efficiently scans all of those, so we might just as well use showpkg's cmdline output)

note 1: we might/should be able to state to ignore/override distribution: of the package as well, so we could e.g. regenerate env originally built on ubuntu, on a debian base. So pretty much similar overrides to version: specification should be allowed

yarikoptic commented 8 years ago

@rbuccigrossi , as you have played most with reprozip, do you think it is a viable idea to implement "cheaply" within reprozip?

rbuccigrossi commented 8 years ago

I became really enamored by reprozip's ability to record the execution of an experiment and play it back. But faced with this question, for playback it really is only a small step ahead of Ansible, Docker, Packer, and other environment creation scripts because since:

  1. It was designed to play back a single recorded experiment, and therefore has a rather focused YAML format for environment creation
  2. It supports multiple modes of play back (locally and on VMs)

But if we aren't using reprozip's recording capability, we may be able to create an even simpler YAML format (possibly based upon reprozip stripping out many things we don't need), and use that in a Docker (Ansible, or Packer, etc.) configuration script.

Now that we have a bunch of potential tools on the table, I suggest we go back to our proposal, extract our goals for NICEMAN, and see if we can come up with a single page treatment of the key requirements in light of what we now have on hand...

yarikoptic commented 8 years ago

;-) correct -- so see/add/... that Repronim A/CL PI document (I just also emailed on the list about that) we have where I am just sketching possible high level use-cases. Indeed, not everything in reprozip's trace file might be needed to reproduce the environment BUT greedy me thinks that as long as we (or reprozip) traces execution, it should collect as much of detail as possible. E.g. what other aspect reprozip (or nidm for that matter AFAIK) doesn't trace, although could, ATM is resource requirements. Even though not precise, they could provide a ballpark for necessary resources.

yarikoptic commented 7 years ago

this is somewhat addressed already in the codebase ATM -- we are collecting extended amount of information about deb packages/apt repos