Argument validation for input maf dataframe

weinstockj commented 4 years ago

Hi SignatureAnalyzer developers,

I attempted to run sa.run_maf with a Pandas dataframe as the first argument instead of a string. Doing so resulted in

"Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'"

. Inspecting source indicates that the object is directly passed into pd.read_csv (https://github.com/broadinstitute/getzlab-SignatureAnalyzer/blob/master/signatureanalyzer/signatureanalyzer.py#L85)

I tried this approach (DataFrame argument instead of file path) as it was suggested by the README code:

# Run array of decompositions with mutational signature processing
sa.run_maf(input.maf, outdir='./ardnmf_output/', cosmic='cosmic2', hg_build='./ref/hg19.2bit', nruns=10)

Thanks for developing SignatureAnalyzer, Josh

shankara-a commented 4 years ago

Hi,

The run_maf function does a little bit of pre-processing specific to the Mutation Annotation File (maf) format that parses the inputs, generates spectra, etc. As of now, we have the input for this from the python api as a string for a path leading to a file.

However, if you want to run just the algorithm itself, you can call bnmf found here https://github.com/broadinstitute/getzlab-SignatureAnalyzer/blob/7e18db8bf84aae8e0273b1052df84ec969c8c5de/signatureanalyzer/bnmf.py#L20

This will take a DataFrame. We could add an enhancement for the python API to take in either a file-path or a DataFrame in the future.

Best,

Shankara

weinstockj commented 4 years ago

Great, thanks so much for your consideration. My suggestion was more alluding to documentation (rather than a feature request).

shankara-a commented 4 years ago

Thanks, we can clarify this in the README. It does not specify what input.maf is as of now.

getzlab / SignatureAnalyzer

Argument validation for input maf dataframe #13