Closed ozak closed 5 years ago
Some more info...the code runs, but it does not really seem to generate or process more than one input. So it is not really doing anything in parallel.
I tried another function to append many files and things are even worse, now it returns no errors, but does not seem to process even 1 file
def MergeNondecoded(z):
files = os.listdir(pathout5 + z + '/')
fin = pathout5 + z + '/' + files[0]
identifier = 'gen filename = "' + files[0].replace('.dta', '').replace('.DTA', '') + '"'
myappend = "\n".join(["qui append using " + '"' + pathout5 + z + '/' + f + '", force\ncapture drop s*\nreplace filename = "' + f.replace('.dta', '').replace('.DTA', '') + '" if filename==""\ndi "' + f + '"' for f in files[1:]])
fout = pathout5 + z + ".dta"
StataCommand = """
set matsize 11000
set maxvar 32000
use "fin", clear
identifier
myappend
compress
save "fout", replace
"""
StataCommand = StataCommand.replace('identifier', identifier)
StataCommand = StataCommand.replace('myappend', myappend)
StataCommand = StataCommand.replace('fout', fout).replace('fin', fin)
get_ipython().run_cell_magic(u'stata', u'', StataCommand)
return 0
results = view.map_async(MergeNondecoded, list(dfzip.DataInfo.unique()))
returns a vector of zeros, but no files are created.
But running
for z in dfzip.DataInfo.unique():
MergeNondecoded(z)
works fine.
Hi Ozak!
Sorry for not getting back to you earlier, I just switched jobs so life is a bit hectic.
Running it in parallel probably doesn't work because I haven't programmed some of the temp files to be completely isolated, at least not for the "batch mode" functionality. I agree that this would be nice but to be honest it is a cost (i.e. my time) vs. benefit trade-off.
My recommendation would be to just interact with Stata directly without using ipystata. This is actually very simple as it only requires a couple of things to change:
You can see how I tell Python to run a .do file using the command line here: https://github.com/TiesdeKok/ipystata/blob/21049a4b0639aaf8cbda4a889cf4cd562c4b7d7d/ipystata/ipystata_magic_batch.py#L200-L217
Obviously you would have to tell it where the Stata executable is.
Does this help?
I think I get the idea. Thanks!
Hi,
Is there a way to run this in parallel in a jupyter ipyparallel session. I need to perform the same operation on many files so I was planning on running it using multiple processes via ipyparallel. The issue I have is that it is not clear how to execute the cell magic
%%stata
on a client. Here's some code to get the ideaThe code inside the function
DecodeLabels
works fine. But not in the parallel execution. Any ideas?Thanks for the great package!