glm-tools / pyglmnet

Python implementation of elastic-net regularized generalized linear models
http://glm-tools.github.io/pyglmnet/
MIT License
279 stars 83 forks source link

[MRG] Improve test coverage #253

Open jasmainak opened 6 years ago

jasmainak commented 6 years ago

Let's see if this helps with coverage ...

codecov-io commented 6 years ago

Codecov Report

Merging #253 into master will decrease coverage by 0.65%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #253      +/-   ##
==========================================
- Coverage   75.48%   74.82%   -0.66%     
==========================================
  Files           4        5       +1     
  Lines         673      719      +46     
  Branches      148      130      -18     
==========================================
+ Hits          508      538      +30     
- Misses        128      140      +12     
- Partials       37       41       +4
Impacted Files Coverage Δ
pyglmnet/utils.py 32.55% <0%> (-10.58%) :arrow_down:
pyglmnet/pyglmnet.py 79.67% <0%> (-1.41%) :arrow_down:
pyglmnet/datasets.py 81.37% <0%> (ø)
pyglmnet/base.py 48.48% <0%> (+3.32%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f170e00...85de164. Read the comment docs.

pavanramkumar commented 6 years ago

hmm, looks like it didn't budge. merge or close?

jasmainak commented 6 years ago

neither. Wait a bit. I'll give another try in a day or two :)

jasmainak commented 6 years ago

@pavanramkumar looks like the dataset fetchers don't work on python 3.5. Do you want to iterate on top of my pull request here? You can push directly to my branch if you want.

jasmainak commented 5 years ago

@pavanramkumar let's merge this? It will improve coverage a little ...

pavanramkumar commented 5 years ago

@jasmainak it's really strange why the dataset fetcher doesn't work with travis. when i run it on my local py35 evironment, it works fine. perhaps a miniconda dependency issue?

py.test --cov=pyglmnet tests/test_pyglmnet.py -k 'test_fetch_datasets'
===================================================================== test session starts =====================================================================
platform darwin -- Python 3.5.4, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /Users/pavanramkumar/Projects/pyglmnet, inifile:
plugins: cov-2.6.0
collected 8 items                                                                                                                                              

tests/test_pyglmnet.py .

---------- coverage: platform darwin, python 3.5.4-final-0 -----------
Name                   Stmts   Miss Branch BrPart  Cover
--------------------------------------------------------
pyglmnet/__init__.py       4      0      0      0   100%
pyglmnet/base.py          66     56     34      0    10%
pyglmnet/metrics.py       21     21      6      0     0%
pyglmnet/pyglmnet.py     488    442    202      0     7%
pyglmnet/utils.py         43     34     10      0    17%
--------------------------------------------------------
TOTAL                    622    553    252      0     8%

===================================================================== 7 tests deselected ======================================================================
=========================================================== 1 passed, 7 deselected in 2.87 seconds ============================================================
jasmainak commented 5 years ago

That's weird I am able to reproduce the Travis issue though. Are you sure you are on the right branch? On your codecov output I don't see datasets.py

(py35) mainak@mainak-ThinkPad-W540 ~/Desktop/projects/github_repos/pyglmnet $ pytest tests/test_pyglmnet.py -k 'test_fetch_datasets'
============================================================== test session starts ===============================================================
platform linux -- Python 3.5.3, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /home/mainak/Desktop/projects/github_repos/pyglmnet, inifile:
collected 9 items 

tests/test_pyglmnet.py F

==================================================================== FAILURES ====================================================================
______________________________________________________________ test_fetch_datasets _______________________________________________________________

    def test_fetch_datasets():
        """Test fetching datasets."""
        datasets.fetch_community_crime_data('/tmp/glm-tools')
>       datasets.fetch_group_lasso_datasets()

tests/test_pyglmnet.py:348: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def fetch_group_lasso_datasets():
        """
        Downloads and formats data needed for the group lasso example.

        Returns:
        --------
        design_matrix: pandas.DataFrame
            pandas dataframe with formatted data and labels

        groups: list
            list of group indicies, the value of the ith position in the list
            is the group number for the ith regression coefficient

        """
        try:
            import pandas as pd
        except ImportError:
            raise ImportError('The pandas module is required for the '
                              'group lasso dataset')

        # helper functions

        def find_interaction_index(seq, subseq,
                                   alphabet="ATGC",
                                   all_possible_len_n_interactions=None):
            n = len(subseq)
            alphabet_interactions = \
                [set(p) for
                 p in list(itertools.combinations_with_replacement(alphabet, n))]

            num_interactions = len(alphabet_interactions)
            if all_possible_len_n_interactions is None:
                all_possible_len_n_interactions = \
                    [set(interaction) for
                     interaction in
                     list(itertools.combinations_with_replacement(seq, n))]

            subseq = set(subseq)

            group_index = num_interactions * \
                all_possible_len_n_interactions.index(subseq)
            value_index = alphabet_interactions.index(subseq)

            final_index = group_index + value_index
            return final_index

        def create_group_indicies_list(seqlength=7,
                                       alphabet="ATGC",
                                       interactions=[1, 2, 3],
                                       include_extra=True):
            alphabet_length = len(alphabet)
            index_groups = []
            if include_extra:
                index_groups.append(0)
            group_count = 1
            for inter in interactions:
                n_interactions = comb(seqlength, inter)
                n_alphabet_combos = comb(alphabet_length,
                                         inter,
                                         repetition=True)

                for x1 in range(int(n_interactions)):
                    for x2 in range(int(n_alphabet_combos)):
                        index_groups.append(int(group_count))

                    group_count += 1
            return index_groups

        def create_feature_vector_for_sequence(seq,
                                               alphabet="ATGC",
                                               interactions=[1, 2, 3]):
            feature_vector_length = \
                sum([comb(len(seq), inter) *
                     comb(len(alphabet), inter, repetition=True)
                     for inter in interactions]) + 1

            feature_vector = np.zeros(int(feature_vector_length))
            feature_vector[0] = 1.0
            for inter in interactions:
                # interactions at the current level
                cur_interactions = \
                    [set(p) for p in list(itertools.combinations(seq, inter))]
                interaction_idxs = \
                    [find_interaction_index(
                     seq, cur_inter,
                     all_possible_len_n_interactions=cur_interactions) + 1
                     for cur_inter in cur_interactions]
                feature_vector[interaction_idxs] = 1.0

            return feature_vector

        positive_url = \
            "http://genes.mit.edu/burgelab/maxent/ssdata/MEMset/train5_hs"
        negative_url = \
            "http://genes.mit.edu/burgelab/maxent/ssdata/MEMset/train0_5_hs"

>       pos_file = tempfile.NamedTemporaryFile(bufsize=0)
E       TypeError: NamedTemporaryFile() got an unexpected keyword argument 'bufsize'

pyglmnet/datasets.py:203: TypeError
-------------------------------------------------------------- Captured stdout call --------------------------------------------------------------

...99%, 1 MB
...100%, 1 MB
=============================================================== 8 tests deselected ===============================================================
===================================================== 1 failed, 8 deselected in 2.85 seconds =====================================================
jasmainak commented 5 years ago

arfff ... now the website hosting the community crime data seems to be down. So Travis won't pass ...