cgat-developers / cgat-flow

cgat-flow repository
MIT License
13 stars 9 forks source link

Js rnaseqdiffexpression.py #131

Closed Acribbs closed 3 years ago

Acribbs commented 3 years ago

Hi @jscaber Looks good to me.

Just created this pull request to make comments.

Great removing rpy2, this is actually how I develop all my pipelines now, debugging is a nightmare with rpy2. I try and stick to native R scripts and call them using statements.

Only comment would be; wouldn't io.R not be better located within one of the task libraries since it seems to be single-cell specific?

Also would be great if this could be documented within the docs

Acribbs commented 3 years ago

The tests seem to be timing out too, I don't know why this is happening because I cant see that you have modified the dependancies. Maybe im missing something

jscaber commented 3 years ago

Thanks, in the first instance I have restarte the last master build that did work (2 months ago), lets see what happens.

jscaber commented 3 years ago

Yes, as expected, master is now failing also, must be a package dependency that's changed availability in some way. Regarding io.R, yes will delete it, this is a remnant from the original code that I started out with.

Acribbs commented 3 years ago

@jscaber wondering if it could be because of python 3.8 and we may need to pin to python 3.7 for the time being. I have had some issues moving to new version of python

jscaber commented 3 years ago

Looks like another rpy2 incompatibility. This is trying to install the master branch on the cluster.

UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (jinja2):

  - beautifulsoup4 -> python[version='>=3.6'] -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8|1.2.11.*']
  - httplib2 -> python -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8|1.2.11.*']
  - intermine -> python -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8|1.2.11.*']
  - mygene -> python -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8|1.2.11.*']
  - nomkl
  - python[version='>=3.6'] -> pip -> setuptools
  - python[version='>=3.6'] -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8']
  - rpy2[version='<=2.9.4'] -> jinja2 -> setuptools
  - rpy2[version='<=2.9.4'] -> python[version='>=3.6,<3.7.0a0'] -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8|1.2.11.*']
  - rpy2[version='<=2.9.4'] -> r-base[version='>=3.5,<3.6.0a0'] -> libpng[version='1.6.*|>=1.6.22,<1.6.31|>=1.6.32,<1.6.35|>=1.6.34,<1.7.0a0|>=1.6.35,<1.7.0a0|>=1.6.37,<1.7.0a0|>=1.6.28,<1.7|>=1.6.27,<1.7|>=1.6.32,<1.7.0a0']
  - rpy2[version='<=2.9.4'] -> six
  - toposort -> python -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8|1.2.11.*']
  - tzlocal -> python -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0|1.2.8|1.2.11.*']
jscaber commented 3 years ago

Not surprising in a way given that we have pinned a requirement to a package version from 2018 (v2.9). If we unpinned, rpy2 v3 has significant changes in the way it operates, and it's beyond me to repair all the instances (such as in the peakcalling pipeline).

sebastian-luna-valero commented 3 years ago

Hi,

Could you paste the command you used to install the code, please?

Best regards, Sebastian

jscaber commented 3 years ago

./install-devel.sh --install-repo --install-pipeline-dependencies --clone-from-repo --location /path/to/dir/cgat-developers-v0 >install.log

sebastian-luna-valero commented 3 years ago

Hi,

Please have a look at this option:

https://github.com/cgat-developers/cgat-flow/blob/master/install.sh#L732

IIRC it used to be printed out with the "--help" option of "install.sh", which is quite helpful.

Best regards, Sebastian

jscaber commented 3 years ago

Thanks, you mean for adjusting the dependencies and testing right? I'll give it a go.

I think it's this one for install-devel. https://github.com/cgat-developers/cgat-flow/blob/97c4aa964b77091e025277d58a61b028d3741335/install-devel.sh#L1096

sebastian-luna-valero commented 3 years ago

Great, you are right!

Acribbs commented 3 years ago

@jscaber have moved to GitHub action tests. If you're happy with this then merge when your happy.

jscaber commented 3 years ago

Thanks a lot for the great work recently. I'm happy to merge this for now but there are some limitations of the code:

Overall however it does run quite well, and so far much better than the previous code.

jscaber commented 3 years ago

Finally, I couldn't completely remove Tom's old classes and libraries, because the peakcalling pipeline still depends on it, and I never had the time to get acquainted with peakcalling.

Not sure how many people still use the cgatflow library (probably none or very few).

Acribbs commented 3 years ago

Hi @jscaber no problem.

I certainly use it for annotations, mapping, rnaseqdiffexpress and peak calling, but not much else to be fair.

jscaber commented 3 years ago

Yes, that's what I use too. I think this is an improvement on the code that we have, so will merge. This only affects the diffexpression pipeline beyond the point where counts tables are generated to won't affect most existing workflows (My understanding is that most people would take the counts table or salmon output and continue elsewhere with that).

It will probably benefit from some tests etc, and I'll have a look at that & once you use it let me know and am happy to help adjust the code to fit other organisms etc.

jscaber commented 3 years ago

Further rpy2 cleanup and a further cleanup of unused scripts and tools might be helpful so that we can concentrate on those pipelines that we are able to maintain to a very basic level here, but then that's a discussion for a different time I guess (it always is).